Minimal Example

This example demonstrates the minimum required fields for a valid LLM Data Card. It’s ideal for simple datasets that don’t need extensive metadata.

When to Use This Pattern

Use this minimal structure when:

  • Your dataset is straightforward and publicly available
  • You want to quickly publish a data card
  • Your dataset has no personal data or complex licensing

Complete Example

{
  "schema_version": "llm-datacard/v1.0",
  "core": {
    "id": "example-minimal",
    "version": "1.0.0",
    "title": "Example Minimal Dataset",
    "summary": "A minimal example dataset demonstrating the required fields of the LLM Data Card schema.",
    "maintainer": "DataPass Team",
    "contact": "data@meetkai.ai"
  },
  "data": {
    "kind": "real",
    "modalities": ["text"],
    "languages": ["en"],
    "size": {
      "examples": 1000
    },
    "domains": ["general"],
    "record_format": "json-structured"
  },
  "rights": {
    "license": "CC0-1.0",
    "allows_commercial_use": true,
    "contains_personal_data": "none"
  },
  "provenance": {
    "source_types": ["official-open-data"]
  },
  "access": {
    "availability": "public-download",
    "url": "https://datapass.meetkai.ai/registry/example-minimal/1.0.0"
  }
}

Section Breakdown

Core Section

Every data card must include these core identifiers:

FieldPurposeExample Value
idUnique dataset identifierexample-minimal
versionSemantic version1.0.0
titleHuman-readable nameExample Minimal Dataset
summaryBrief description1-3 sentences
maintainerOrganization or personDataPass Team
contactEmail for inquiriesdata@meetkai.ai

Data Section

Describes what the dataset contains:

FieldPurposeThis Example
kindReal, synthetic, or hybridreal
modalitiesTypes of content["text"]
languagesBCP-47 language codes["en"]
size.examplesNumber of records1000
domainsSubject areas["general"]
record_formatRecord formatjson-structured

Rights Section

Specifies licensing and data privacy:

FieldPurposeThis Example
licenseSPDX license identifierCC0-1.0
allows_commercial_useCommercial usage allowedtrue
contains_personal_dataPrivacy levelnone

Provenance Section

Documents data origins:

FieldPurposeThis Example
source_typesHow data was collected["official-open-data"]

Access Section

Tells users how to get the data:

FieldPurposeThis Example
availabilityAccess methodpublic-download
urlDownload locationDirect URL

Optional Enhancements

Consider adding these fields as your dataset matures:

  • core.doi - Digital Object Identifier for citation
  • core.preferred_citation - How to cite the dataset
  • data.size.tokens - Token count for text datasets
  • provenance.collection_start_date / collection_end_date - Collection timeframe
  • use.intended_uses - Recommended applications

Try It