Skip to main content
The COLA Cloud bulk data product delivers structured alcohol label data via Snowflake data share or S3 export. For the REST API (which exposes a subset of these fields), see the API Reference. The source of truth is the TTB Public COLA Registry, enriched with OCR, barcode extraction, and LLM inference.

Entity Relationship Diagram

┌─────────────────┐       ┌─────────────────┐       ┌─────────────────────┐
│   permittees    │───1:N─│      colas      │───1:N─│    cola_images      │
│                 │       │                 │       │                     │
│ permit_number   │       │ ttb_id          │       │ ttb_image_id        │
│ company_name    │       │ permit_number   │       │ ttb_id              │
│ ...             │       │ ...             │       │ ...                 │
└─────────────────┘       └─────────────────┘       └─────────────────────┘
                                 │                         │
                                 │                         │
                                 └─────────────────────────┼───1:N─┐
                                                           │       │
                                                  ┌────────────────────────┐
                                                  │  cola_image_barcodes   │
                                                  │                        │
                                                  │  ttb_image_barcode_id  │
                                                  │  ttb_image_id          │
                                                  │  ttb_id               │
                                                  │  ...                   │
                                                  └────────────────────────┘

colas

Certificates of Label Approval issued by the TTB. Each row is a unique label approval application. Features come directly from the TTB application as well as post-processing (barcode recognition, OCR, LLM inference). Contains COLAs from 2005 through yesterday.
ColumnTypeDescription
ttb_idstringPrimary key. Unique identifier assigned by the TTB (14 digits)
application_typestringPurpose of the application: approval or exemption
application_statusstringCurrent status: approved, revoked, surrendered, or expired
is_distinctive_containerbooleanWhether the container is unusual and requires specific approval
for_distinctive_capacitystringVolume of the distinctive container as free text from the application
is_resubmissionbooleanWhether this is a resubmission of a previous COLA
for_resubmission_ttb_idstringThe ttb_id of the previous COLA, when this is a resubmission
for_exemption_statestringFor exemption applications, the US state where the product will be exclusively sold
approval_qualificationsstringQualifying statements by the TTB relating to specific conditions of approval
off_label_informationstringManufacturer-specified product information appearing on the container but not on the provided labels
ColumnTypeDescription
is_form_physicalbooleanWhether the application was submitted as a physical form. Physical submissions lack imagery and several other features
form_image_s3_keystringS3 key to the scanned form document image (physical submissions only)
ColumnTypeDescription
application_datedateDate the application was submitted
approval_datedateDate the application was approved
expiration_datedateDate the approval expires, when applicable
latest_update_datedateLatest date in the process (update, application, or approval). The “completed date” in the TTB’s COLA Search Registry
ColumnTypeDescription
product_namestringThe “fanciful name” in the COLA Search Registry. Includes logic for missing names or names placed in the brand_name field
brand_namestringThe “brand name” in the COLA Search Registry. Includes logic for product_names placed in the brand_name field
product_typestringType of alcohol: malt beverage, distilled spirits, or wine
class_idstringTTB product class code
class_namestringTTB product class name (e.g., “Whisky”, “Table Wine”, “Ale”)
origin_idstringTTB origin code
origin_namestringTTB origin name (country or US state)
domestic_or_importedstringWhether the product is domestic or imported
grape_varietalsarrayWine grape varietals, drawn from both the COLA and LLM interpretation of label text
wine_vintage_yearintegerVintage year for wine and liquor products, drawn from both the COLA and LLM interpretation
wine_appellationstringWine appellation, drawn from both the COLA and LLM interpretation of label text
formula_codestringCode relating to formulation approvals
ColumnTypeDescription
permit_numberstringFK to permittees. The applicant’s plant registry, basic permit, or brewers number
applicant_namestringName of the applicant
applicant_phone_numberstringBusiness phone number of the applicant
address_textstringFull business address of the applicant
address_recipientstringBusiness recipient extracted from the address (first line)
address_zip_codestringZip code extracted from the business address
address_statestringUS state abbreviation extracted from the business address
Extracted from label images using OCR (Optical Character Recognition).
ColumnTypeDescription
ocr_abvfloatABV percentage extracted from label images
ocr_abv_ttb_image_idstringFK to cola_images. The image from which the ABV was extracted
ocr_volumefloatVolume quantity extracted from label images
ocr_volume_unitstringVolume units extracted from label images (e.g., “ml”, “fl oz”)
ocr_volume_ttb_image_idstringFK to cola_images. The image from which volume was extracted
Pre-computed image metadata rolled up to the COLA level.
ColumnTypeDescription
main_ttb_image_idstringFK to cola_images. The front image, or a fallback if no front image exists
main_image_s3_keystringS3 key of the main image
image_countintegerNumber of associated label images (excludes form images)
image_count_brokenintegerNumber of label images that couldn’t be opened with standard Python libraries
has_front_imagebooleanWhether the COLA has a front (or top of keg) label image
has_back_imagebooleanWhether the COLA has a back label image
has_neck_imagebooleanWhether the COLA has a neck label image
has_strip_imagebooleanWhether the COLA has a strip label image
The “best” barcode for this COLA, rolled up from cola_image_barcodes.
ColumnTypeDescription
barcode_typestringBarcode type (e.g., upca, qr)
barcode_valuestringDecoded barcode value (e.g., 012345678901)
ttb_image_barcode_idstringFK to cola_image_barcodes. The specific barcode record
qrcode_urlstringURL extracted from QR codes found in label images
Inferred from label text using LLM (Large Language Model) analysis.
ColumnTypeDescription
llm_categorystringHierarchical category name inferred from label text (e.g., “Bourbon”)
llm_category_pathstringFull path through the category hierarchy (e.g., “Spirits > Whiskey > Bourbon”)
llm_container_typestringContainer type inferred from label text (e.g., can, bottle, keg)
llm_product_descriptionstringFree-text product description inferred from the label
llm_tasting_notesstringFree-text tasting notes inferred from the label
llm_tasting_note_flavorsarrayArray of tasting note flavors inferred from label text
llm_brand_established_yearintegerYear the brand was established, inferred from label text
llm_artwork_creditstringArtist or designer credit for label artwork
llm_wine_designationstringSpecial designations for wines (e.g., “Reserve”, “Estate”)
llm_beer_ibustringInternational Bitterness Units for beers (~5-120), inferred from label text
llm_beer_hops_varietiesarrayHop variety names for beer products
llm_liquor_aged_yearsintegerYears aged for spirits
llm_liquor_finishing_processstringFinishing process details for spirits (e.g., “Sherry cask finished”)
llm_liquor_grainsarrayGrains used in spirit production

cola_images

Individual label images associated with each COLA application. Image files are stored in S3. Each COLA typically has 1-2 images (front and back labels). Images contain physical dimensions, pixel dimensions, and the relative position on the container.
ColumnTypeDescription
ttb_image_idstringPrimary key. Concatenation of ttb_id and image_index
ttb_idstringFK to colas
image_indexintegerIndex of the image within the COLA, starting from 0
s3_keystringPath to the image file in S3
extension_typestringImage file format: JPEG, PNG, or TIFF
file_size_mbfloatImage file size in megabytes
width_pixelsintegerImage width in pixels
height_pixelsintegerImage height in pixels
container_positionstringLabel position on the container: front, neck, back, strip, or other
width_inchesfloatApproximate physical width of the label in inches
height_inchesfloatApproximate physical height of the label in inches
barcode_countintegerNumber of non-QR barcodes found in the image
qrcode_countintegerNumber of QR codes found in the image
ocr_textstringFull OCR text extracted from the image via Google Cloud Vision API
is_openablebooleanWhether the image could be opened with Python’s standard image library

cola_image_barcodes

Barcodes found in label images, extracted with the PyZBAR library. Barcode types include one-dimensional (UPCA, EAN-13) and two-dimensional (QR codes). Includes bounding box and position information.
ColumnTypeDescription
ttb_image_barcode_idstringPrimary key. Concatenation of ttb_image_id and barcode index
ttb_image_idstringFK to cola_images
ttb_idstringFK to colas (denormalized)
image_barcode_indexintegerIndex of the barcode within the image, starting from 0
barcode_typestringBarcode type (e.g., upca, qr)
barcode_valuestringDecoded barcode value
barcode_cola_occurencesintegerNumber of times this barcode value appears across all COLAs. Higher counts indicate decreased reliability
width_pixelsintegerBarcode width in pixels
height_pixelsintegerBarcode height in pixels
image_offset_top_pixelsintegerOffset from the top of the image in pixels
image_offset_left_pixelsintegerOffset from the left of the image in pixels
orientationstringBarcode orientation: vertical, horizontal, or square
relative_image_positionstringPosition within the image (e.g., “top left”, “bottom center”)

permittees

TTB permit holders — businesses authorized to produce or import alcohol in the United States. Combines permits from TTB bulk exports (distilleries, importers, wholesalers) with permits found in the COLA Registry (breweries, wineries).
ColumnTypeDescription
permit_numberstringPrimary key. TTB permit number with hyphen-separated sections indicating permit type
company_namestringName of the permit-holding entity from the active permit, or the most recent COLA
company_statestringUS state (lowercase) from the active permit, or the most recent COLA
company_zip_codestring5-digit US ZIP code from the active permit, or the most recent COLA
permittee_typestringIndustry type from TTB permit records
is_activebooleanWhether the permit is in TTB export records, or has a COLA in the last 365 days
active_reasonstringDetails the is_active indicator: “permit listed” or “cola within 365 days”
colasintegerAll-time count of COLA applications for this permit
colas_approvedintegerAll-time count of approved COLA applications
last_cola_application_datedateDate of the most recent COLA application

Update Frequency

  • New COLAs: ~2,300/week (scraped daily from the TTB Public COLA Registry)
  • Image processing: Within 24 hours of COLA scrape
  • Barcode extraction: Batch processed weekly
  • LLM enrichment: Batch processed weekly
  • Permittees: Updated with each daily scrape and periodic TTB bulk imports