Quickstart
This guide will help you create a valid LLM Data Card v1.0 in just a few minutes.
Minimal Valid Card
Here’s the smallest valid data card you can create:
{
"schema_version": "llm-datacard/v1.0",
"core": {
"id": "my-dataset",
"version": "1.0.0",
"title": "My Dataset",
"summary": "A brief description of what this dataset contains.",
"maintainer": "Your Organization",
"contact": "data@meetkai.ai"
},
"data": {
"kind": "real",
"modalities": ["text"],
"languages": ["en"],
"size": {
"examples": 10000
},
"domains": ["news"],
"record_format": "plain-text"
},
"rights": {
"license": "CC-BY-4.0",
"allows_commercial_use": true,
"contains_personal_data": "none"
},
"provenance": {
"source_types": ["web-scrape"]
},
"access": {
"availability": "public-download",
"url": "https://datapass.meetkai.ai/registry/my-dataset/1.0.0"
}
}
Step-by-Step Guide
-
Set the schema version
Always start with
"schema_version": "llm-datacard/v1.0". -
Fill in core identity
The
coresection identifies your dataset:id: A machine-friendly slug (letters, numbers, dots, hyphens, underscores)version: Your version label (e.g., “1.0.0” or “2025-01-15”)title: Human-readable namesummary: 1-3 sentence descriptionmaintainer: Who maintains this datasetcontact: Email or URL for questions
-
Describe the data
The
datasection describes what’s in your dataset:kind: “real”, “synthetic”, or “hybrid”modalities: Array of [“text”, “speech”, “audio”, “image”, “video”, “code”, “multimodal”]languages: Array of BCP-47 tags (e.g., “en”, “ar”, “ha-Latn-NG”)size.examples: Number of examples/recordsdomains: Content domains (e.g., “news”, “social-media”, “health”)record_format: Structure of each example
-
Specify rights
The
rightssection covers licensing:license: SPDX identifier preferred (e.g., “MIT”, “CC-BY-4.0”)allows_commercial_use: Booleancontains_personal_data: “none”, “de_minimis”, “pseudonymous”, or “direct”
-
Document provenance
The
provenancesection explains where data came from:source_types: Array of source types
-
Define access
The
accesssection explains how to get the dataset:availability: “public-download”, “restricted”, “on-request”, or “not-available”- Include
urlorrequest_instructionsas appropriate
Common Conditional Requirements
The schema has smart validation rules. Here are the most common ones:
| When… | You must also provide… |
|---|---|
data.kind is “synthetic” or “hybrid” | synthetic.generation_method and synthetic.share_of_dataset |
rights.contains_personal_data is not “none” | rights.consent_mechanism |
access.availability is “restricted” or “on-request” | access.request_instructions or access.url |
access.availability is “not-available” | access.not_available_reason |
data.has_human_annotations is true | data.label_types |
Validate Your Card
Use our Validator tool to check your card against the schema, or run validation locally:
npm install ajv ajv-formats
import Ajv from "ajv/dist/2020";
import addFormats from "ajv-formats";
const ajv = new Ajv({ allErrors: true });
addFormats(ajv);
const schema = await fetch("https://datapass.meetkai.ai/schemas/llm-datacard/v1.0/schema.json").then((r) => r.json());
const validate = ajv.compile(schema);
const valid = validate(yourDataCard);
if (!valid) {
console.log(validate.errors);
}
Next Steps
- Read the Field Reference for detailed documentation
- See Examples for real-world data cards
- Learn about Validation Rules for conditional requirements