Entity Relationship Diagram
colas
Certificates of Label Approval issued by the TTB. Each row is a unique label approval application. Features come directly from the TTB application as well as post-processing (barcode recognition, OCR, LLM inference). Contains COLAs from 2005 through yesterday.Identification & Application
Identification & Application
| Column | Type | Description |
|---|---|---|
ttb_id | string | Primary key. Unique identifier assigned by the TTB (14 digits) |
application_type | string | Purpose of the application: approval or exemption |
application_status | string | Current status: approved, revoked, surrendered, or expired |
is_distinctive_container | boolean | Whether the container is unusual and requires specific approval |
for_distinctive_capacity | string | Volume of the distinctive container as free text from the application |
is_resubmission | boolean | Whether this is a resubmission of a previous COLA |
for_resubmission_ttb_id | string | The ttb_id of the previous COLA, when this is a resubmission |
for_exemption_state | string | For exemption applications, the US state where the product will be exclusively sold |
approval_qualifications | string | Qualifying statements by the TTB relating to specific conditions of approval |
off_label_information | string | Manufacturer-specified product information appearing on the container but not on the provided labels |
Form & Submission
Form & Submission
| Column | Type | Description |
|---|---|---|
is_form_physical | boolean | Whether the application was submitted as a physical form. Physical submissions lack imagery and several other features |
form_image_s3_key | string | S3 key to the scanned form document image (physical submissions only) |
Dates
Dates
| Column | Type | Description |
|---|---|---|
application_date | date | Date the application was submitted |
approval_date | date | Date the application was approved |
expiration_date | date | Date the approval expires, when applicable |
latest_update_date | date | Latest date in the process (update, application, or approval). The “completed date” in the TTB’s COLA Search Registry |
Product
Product
| Column | Type | Description |
|---|---|---|
product_name | string | The “fanciful name” in the COLA Search Registry. Includes logic for missing names or names placed in the brand_name field |
brand_name | string | The “brand name” in the COLA Search Registry. Includes logic for product_names placed in the brand_name field |
product_type | string | Type of alcohol: malt beverage, distilled spirits, or wine |
class_id | string | TTB product class code |
class_name | string | TTB product class name (e.g., “Whisky”, “Table Wine”, “Ale”) |
origin_id | string | TTB origin code |
origin_name | string | TTB origin name (country or US state) |
domestic_or_imported | string | Whether the product is domestic or imported |
grape_varietals | array | Wine grape varietals, drawn from both the COLA and LLM interpretation of label text |
wine_vintage_year | integer | Vintage year for wine and liquor products, drawn from both the COLA and LLM interpretation |
wine_appellation | string | Wine appellation, drawn from both the COLA and LLM interpretation of label text |
formula_code | string | Code relating to formulation approvals |
Permit & Address
Permit & Address
| Column | Type | Description |
|---|---|---|
permit_number | string | FK to permittees. The applicant’s plant registry, basic permit, or brewers number |
applicant_name | string | Name of the applicant |
applicant_phone_number | string | Business phone number of the applicant |
address_text | string | Full business address of the applicant |
address_recipient | string | Business recipient extracted from the address (first line) |
address_zip_code | string | Zip code extracted from the business address |
address_state | string | US state abbreviation extracted from the business address |
OCR Enrichments
OCR Enrichments
Extracted from label images using OCR (Optical Character Recognition).
| Column | Type | Description |
|---|---|---|
ocr_abv | float | ABV percentage extracted from label images |
ocr_abv_ttb_image_id | string | FK to cola_images. The image from which the ABV was extracted |
ocr_volume | float | Volume quantity extracted from label images |
ocr_volume_unit | string | Volume units extracted from label images (e.g., “ml”, “fl oz”) |
ocr_volume_ttb_image_id | string | FK to cola_images. The image from which volume was extracted |
Image Aggregates
Image Aggregates
Pre-computed image metadata rolled up to the COLA level.
| Column | Type | Description |
|---|---|---|
main_ttb_image_id | string | FK to cola_images. The front image, or a fallback if no front image exists |
main_image_s3_key | string | S3 key of the main image |
image_count | integer | Number of associated label images (excludes form images) |
image_count_broken | integer | Number of label images that couldn’t be opened with standard Python libraries |
has_front_image | boolean | Whether the COLA has a front (or top of keg) label image |
has_back_image | boolean | Whether the COLA has a back label image |
has_neck_image | boolean | Whether the COLA has a neck label image |
has_strip_image | boolean | Whether the COLA has a strip label image |
Barcode
Barcode
The “best” barcode for this COLA, rolled up from
cola_image_barcodes.| Column | Type | Description |
|---|---|---|
barcode_type | string | Barcode type (e.g., upca, qr) |
barcode_value | string | Decoded barcode value (e.g., 012345678901) |
ttb_image_barcode_id | string | FK to cola_image_barcodes. The specific barcode record |
qrcode_url | string | URL extracted from QR codes found in label images |
LLM Enrichments
LLM Enrichments
Inferred from label text using LLM (Large Language Model) analysis.
| Column | Type | Description |
|---|---|---|
llm_category | string | Hierarchical category name inferred from label text (e.g., “Bourbon”) |
llm_category_path | string | Full path through the category hierarchy (e.g., “Spirits > Whiskey > Bourbon”) |
llm_container_type | string | Container type inferred from label text (e.g., can, bottle, keg) |
llm_product_description | string | Free-text product description inferred from the label |
llm_tasting_notes | string | Free-text tasting notes inferred from the label |
llm_tasting_note_flavors | array | Array of tasting note flavors inferred from label text |
llm_brand_established_year | integer | Year the brand was established, inferred from label text |
llm_artwork_credit | string | Artist or designer credit for label artwork |
llm_wine_designation | string | Special designations for wines (e.g., “Reserve”, “Estate”) |
llm_beer_ibu | string | International Bitterness Units for beers (~5-120), inferred from label text |
llm_beer_hops_varieties | array | Hop variety names for beer products |
llm_liquor_aged_years | integer | Years aged for spirits |
llm_liquor_finishing_process | string | Finishing process details for spirits (e.g., “Sherry cask finished”) |
llm_liquor_grains | array | Grains used in spirit production |
cola_images
Individual label images associated with each COLA application. Image files are stored in S3. Each COLA typically has 1-2 images (front and back labels). Images contain physical dimensions, pixel dimensions, and the relative position on the container.| Column | Type | Description |
|---|---|---|
ttb_image_id | string | Primary key. Concatenation of ttb_id and image_index |
ttb_id | string | FK to colas |
image_index | integer | Index of the image within the COLA, starting from 0 |
s3_key | string | Path to the image file in S3 |
extension_type | string | Image file format: JPEG, PNG, or TIFF |
file_size_mb | float | Image file size in megabytes |
width_pixels | integer | Image width in pixels |
height_pixels | integer | Image height in pixels |
container_position | string | Label position on the container: front, neck, back, strip, or other |
width_inches | float | Approximate physical width of the label in inches |
height_inches | float | Approximate physical height of the label in inches |
barcode_count | integer | Number of non-QR barcodes found in the image |
qrcode_count | integer | Number of QR codes found in the image |
ocr_text | string | Full OCR text extracted from the image via Google Cloud Vision API |
is_openable | boolean | Whether the image could be opened with Python’s standard image library |
cola_image_barcodes
Barcodes found in label images, extracted with the PyZBAR library. Barcode types include one-dimensional (UPCA, EAN-13) and two-dimensional (QR codes). Includes bounding box and position information.| Column | Type | Description |
|---|---|---|
ttb_image_barcode_id | string | Primary key. Concatenation of ttb_image_id and barcode index |
ttb_image_id | string | FK to cola_images |
ttb_id | string | FK to colas (denormalized) |
image_barcode_index | integer | Index of the barcode within the image, starting from 0 |
barcode_type | string | Barcode type (e.g., upca, qr) |
barcode_value | string | Decoded barcode value |
barcode_cola_occurences | integer | Number of times this barcode value appears across all COLAs. Higher counts indicate decreased reliability |
width_pixels | integer | Barcode width in pixels |
height_pixels | integer | Barcode height in pixels |
image_offset_top_pixels | integer | Offset from the top of the image in pixels |
image_offset_left_pixels | integer | Offset from the left of the image in pixels |
orientation | string | Barcode orientation: vertical, horizontal, or square |
relative_image_position | string | Position within the image (e.g., “top left”, “bottom center”) |
permittees
TTB permit holders — businesses authorized to produce or import alcohol in the United States. Combines permits from TTB bulk exports (distilleries, importers, wholesalers) with permits found in the COLA Registry (breweries, wineries).| Column | Type | Description |
|---|---|---|
permit_number | string | Primary key. TTB permit number with hyphen-separated sections indicating permit type |
company_name | string | Name of the permit-holding entity from the active permit, or the most recent COLA |
company_state | string | US state (lowercase) from the active permit, or the most recent COLA |
company_zip_code | string | 5-digit US ZIP code from the active permit, or the most recent COLA |
permittee_type | string | Industry type from TTB permit records |
is_active | boolean | Whether the permit is in TTB export records, or has a COLA in the last 365 days |
active_reason | string | Details the is_active indicator: “permit listed” or “cola within 365 days” |
colas | integer | All-time count of COLA applications for this permit |
colas_approved | integer | All-time count of approved COLA applications |
last_cola_application_date | date | Date of the most recent COLA application |
Update Frequency
- New COLAs: ~2,300/week (scraped daily from the TTB Public COLA Registry)
- Image processing: Within 24 hours of COLA scrape
- Barcode extraction: Batch processed weekly
- LLM enrichment: Batch processed weekly
- Permittees: Updated with each daily scrape and periodic TTB bulk imports