Documentation

Quickstart

Create your first data card in minutes. Learn the required fields and basic structure. Get started

Field Reference

Detailed documentation for every field in the schema, organized by section. View reference

Validation Rules

Understand the conditional validation rules and how to satisfy them. Learn the rules

Examples

Real-world examples of data cards for different types of datasets. See examples

LLM Training Data Format

Use the training data format when you need to ship the actual records used for fine-tuning.

Training Data Overview

Learn how metadata.json and JSONL records work together. Explore the format

Training Data Quickstart

Build a minimal SFT dataset in minutes. Start the quickstart

Telco Quickstart

Telecom-ready guidance on what data to share and how. Read the guide

Sample Packages

Realistic metadata.json + JSONL you can copy. View samples

Schema Overview

The LLM Data Card v1.0 schema has 5 required sections and 11 optional sections:

Required Sections

  • core - Dataset identity and maintainer info
  • data - Data contents, modalities, languages, size
  • rights - Licensing and personal data status
  • provenance - Where the data came from
  • access - How to obtain the dataset

Optional Sections

  • artifacts - File pointers with checksums for reproducibility
  • processing - Normalization, filtering, deduplication steps
  • quality - Quality measurements and known issues
  • synthetic - Details about synthetic data generation
  • use - Intended and out-of-scope uses
  • governance - Review status and documentation
  • safety - Content risk assessment
  • community - Local/community involvement
  • sources - Per-source breakdown
  • stats - Numeric statistics
  • extensions - Custom fields