Skip to main content

Download

A curated sample of 1,000 recent COLA records (2025 approvals) with associated label images and extracted barcodes. Download cola-sample-pack-v1.zip (~500 KB) No account or API key required.

Contents

The ZIP contains three CSV files:
FileRowsDescription
cola.csv1,000Product label approval records — brand, product type, origin, OCR-extracted ABV/volume, LLM-enriched category, tasting notes, and more
cola_image.csv~1,750Label images for the 1,000 COLAs — dimensions, container position (front/back/neck/strip), and OCR-extracted text
cola_image_barcode.csv~500Barcodes extracted from label images — type (UPC-A, EAN-13, QR, etc.), decoded value, pixel position

Relationships

cola.TTB_ID → cola_image.TTB_ID       (one-to-many)
cola_image.TTB_IMAGE_ID → cola_image_barcode.TTB_IMAGE_ID  (one-to-many)

Key columns

cola.csv includes 60+ columns. Highlights:
  • TTB_ID — unique identifier for each COLA approval
  • BRAND_NAME, PRODUCT_NAME — the product
  • PRODUCT_TYPE — Wine, Malt Beverage, or Distilled Spirits
  • LLM_CATEGORY, LLM_CATEGORY_PATH — AI-classified product taxonomy
  • LLM_PRODUCT_DESCRIPTION — natural language product description from label reading
  • OCR_ABV — alcohol by volume, extracted via OCR
  • BARCODE_VALUE, BARCODE_TYPE — primary barcode from the label
  • MAIN_TTB_IMAGE_ID — viewable at https://dyuie4zgfxmt6.cloudfront.net/{TTB_IMAGE_ID}.webp

Full dataset

The sample represents a small slice of the full COLA Cloud dataset:
  • 2.9M+ COLA records (back to 2005)
  • 5M+ label images
  • 575K+ extracted barcodes
  • Updated daily (~2,500 new approvals per week)
For full access, see the REST API, Snowflake data share, or contact us.