Documentation
Quick Links
Quickstart
Create your first data card in minutes. Learn the required fields and basic structure. Get started
Field Reference
Detailed documentation for every field in the schema, organized by section. View reference
Validation Rules
Understand the conditional validation rules and how to satisfy them. Learn the rules
Examples
Real-world examples of data cards for different types of datasets. See examples
LLM Training Data Format
Use the training data format when you need to ship the actual records used for fine-tuning.
Training Data Overview
Learn how metadata.json and JSONL records work together. Explore the format
Training Data Quickstart
Build a minimal SFT dataset in minutes. Start the quickstart
Telco Quickstart
Telecom-ready guidance on what data to share and how. Read the guide
Sample Packages
Realistic metadata.json + JSONL you can copy. View samples
Schema Overview
The LLM Data Card v1.0 schema has 5 required sections and 11 optional sections:
Required Sections
- core - Dataset identity and maintainer info
- data - Data contents, modalities, languages, size
- rights - Licensing and personal data status
- provenance - Where the data came from
- access - How to obtain the dataset
Optional Sections
- artifacts - File pointers with checksums for reproducibility
- processing - Normalization, filtering, deduplication steps
- quality - Quality measurements and known issues
- synthetic - Details about synthetic data generation
- use - Intended and out-of-scope uses
- governance - Review status and documentation
- safety - Content risk assessment
- community - Local/community involvement
- sources - Per-source breakdown
- stats - Numeric statistics
- extensions - Custom fields