Extensions Section
The extensions section provides a place for vendor-specific or project-specific metadata that doesn’t fit in the standard schema. This enables custom tooling while maintaining schema compatibility.
Schema
{
"extensions": {
"type": "object",
"description": "Optional vendor- or project-specific extensions.",
"additionalProperties": true
}
}
Example
{
"extensions": {
"huggingface": {
"dataset_id": "example-org/hausa-news-corpus",
"config_name": "default",
"features": {
"text": "string",
"source": "string",
"date": "string"
},
"splits": ["train", "validation", "test"]
},
"internal": {
"project_code": "LANG-2024-042",
"cost_center": "research-nlp",
"review_ticket": "REVIEW-1234"
},
"mlflow": {
"experiment_id": "exp-hausa-pretrain",
"run_id": "run-20250115-001"
}
}
}
Common Extension Patterns
Platform-Specific Metadata
{
"extensions": {
"huggingface": {
"dataset_id": "org/dataset-name",
"viewer_enabled": true,
"size_category": "10K<n<100K"
},
"kaggle": {
"competition_id": "hausa-nlp-challenge",
"kernel_count": 45
},
"paperswithcode": {
"dataset_url": "https://paperswithcode.com/dataset/hausa-news"
}
}
}
Internal Tracking
{
"extensions": {
"internal": {
"project_id": "PRJ-2025-001",
"budget_code": "R&D-NLP",
"approved_by": "data-governance-team",
"approval_date": "2025-01-10",
"retention_policy": "7-years",
"data_classification": "internal-use"
}
}
}
Experiment Tracking
{
"extensions": {
"wandb": {
"entity": "research-team",
"project": "multilingual-lm",
"artifact_name": "hausa-corpus:v2"
},
"mlflow": {
"tracking_uri": "https://mlflow.example.org",
"experiment_name": "hausa-pretrain",
"registered_model": "hausa-lm-base"
}
}
}
Custom Quality Metrics
{
"extensions": {
"quality_extended": {
"custom_scorer_version": "1.2.0",
"coherence_score": 0.85,
"factuality_sample_check": {
"sample_size": 100,
"accuracy": 0.92
},
"bias_audit": {
"performed": true,
"report_url": "https://example.org/bias-report.pdf"
}
}
}
}
Dataset Lineage
{
"extensions": {
"lineage": {
"parent_datasets": ["common-crawl:CC-MAIN-2024-05", "wikipedia:20240101"],
"processing_pipeline": "pipeline-v3.2",
"pipeline_commit": "abc123def",
"derived_datasets": ["hausa-news-cleaned:1.1", "hausa-news-translated:1.0"]
}
}
}
Regional Compliance
{
"extensions": {
"compliance": {
"gdpr": {
"applicable": false,
"reason": "No EU personal data"
},
"ccpa": {
"applicable": false,
"reason": "No California resident data"
},
"ndpa": {
"applicable": true,
"compliance_status": "compliant",
"review_date": "2025-01-15"
}
}
}
}
Best Practices
Use Namespaces
Group related extensions under descriptive keys:
{
"extensions": {
"huggingface": { "..." },
"internal": { "..." },
"quality": { "..." }
}
}
Avoid flat structures:
{
"extensions": {
"hf_dataset_id": "...",
"internal_project_id": "...",
"quality_score": "..."
}
}
Document Your Extensions
If you use custom extensions, document them:
{
"extensions": {
"_schema": "https://example.org/datacard-extensions/v1.json",
"_docs": "https://example.org/datacard-extensions/docs",
"custom_field": "value"
}
}
Version Your Extensions
Include version info for custom schemas:
{
"extensions": {
"acme_corp": {
"_version": "2.0",
"department": "research",
"cost_tracking_id": "CT-2025-001"
}
}
}
Interoperability
Don’t Duplicate Core Fields
Put standard metadata in standard fields:
{
"core": {
"id": "my-dataset"
},
"extensions": {
"internal_id": "my-dataset"
}
}
Use Extensions for Platform-Specific Features
{
"access": {
"availability": "public-download",
"url": "https://huggingface.co/datasets/org/dataset"
},
"extensions": {
"huggingface": {
"gated": false,
"viewer": true,
"library": "datasets"
}
}
}
See Also
- Stats Section - For numeric statistics
- Core Section - Standard identity fields
- Governance Section - Documentation links