Source Data
All source data originates from the U.S. Alcohol and Tobacco Tax and Trade Bureau (TTB), a bureau of the U.S. Department of the Treasury.- Primary source: TTB COLA Public Registry — the federal database of approved alcohol product labels
- Supplementary sources: TTB permit holder records, TTB production statistics
- Legal status: All source data is in the public domain under 17 U.S.C. Section 105 (U.S. government works)
Coverage
~2.9M COLAs
Label approval records dating back to 2005
~5M Images
Front and back label images (typically 2 per COLA)
~575K Barcodes
Extracted from label images for product matching
~2,500/week
New records added from TTB on a daily basis
Collection
COLA Cloud collects new and updated records from the TTB on a daily basis, including label images and all associated structured fields — brand name, product type, origin, alcohol content, approval dates, applicant information, and more. Raw source materials are retained in durable storage for auditability and reprocessing.Enrichments
Each record is enriched through a proprietary pipeline that produces the following additional fields:Text Extraction
Label images are processed with optical character recognition to extract all visible text from the label artwork. This captures information printed on the label that isn’t part of the TTB’s structured data — tasting notes, marketing copy, ABV declarations, volume statements, and more.Barcode Identification
Label images are scanned for standard barcode formats. When found, the decoded value, barcode type, and position on the label are recorded.Barcode coverage is approximately 30% of records. This reflects the fact that barcodes appear incidentally on label submissions, not that 70% of products lack barcodes.
AI-Powered Feature Extraction
Each label is processed through proprietary models to extract structured fields that would otherwise require manual review:| Category | Extracted Fields |
|---|---|
| Classification | Hierarchical product category (e.g., Spirits > Whiskey > Bourbon), container type |
| Description | Free-text product description, tasting notes, flavor profile |
| Wine | Appellation, vintage, varietal, designation (Reserve, Estate, etc.) |
| Beer | IBU, hop varieties |
| Spirits | Age statement, finishing process, grain bill |
| Other | Brand established year, artwork credits, certifications (organic, kosher, etc.) |
Address Normalization
Applicant addresses from TTB records are parsed and normalized into structured components for consistent matching and geocoding.Data Freshness
| Commitment | Detail |
|---|---|
| New records | Available within days of TTB publication |
| Enrichments | Applied to new records within one week |
| Data license delivery | Weekly incremental updates |
Known Limitations
We believe in transparency about what automated enrichment can and cannot do:- Source data quality — The TTB registry occasionally contains errors, omissions, or delays. COLA Cloud reflects what the TTB publishes.
- Text extraction accuracy — Extraction from label images is automated and imperfect. Stylized fonts, low-resolution images, and overlapping text can reduce accuracy. Field-level accuracy is not guaranteed.
- AI classification — Product categorization and feature extraction are probabilistic. Sub-category precision varies, and products may be misclassified when label information is ambiguous or incomplete.
- Barcode coverage — ~30% of records have extractable barcodes. This is a function of what appears on the label submission, not a gap in our processing.
- Methodology evolution — COLA Cloud continuously improves its enrichment methods. This may cause field values to change between updates as accuracy improves.
Data Delivery
Licensed data is available through multiple channels:- Snowflake Data Share — Direct access in your Snowflake account, updated weekly (learn more)
- S3 Export — Parquet or CSV files delivered to your S3 bucket
- REST API — Programmatic access with structured queries and filtering (quickstart)
- MCP Server — AI-native access for LLM-powered applications (learn more)

