Restricted Access Dataset

This example demonstrates how to document a dataset with restricted access, including request instructions and terms of use.

When to Use This Pattern

Use this structure when:

  • Your dataset requires approval before access
  • You have specific terms or agreements users must accept
  • The data contains sensitive information requiring controlled distribution

Complete Example

{
  "schema_version": "llm-datacard/v1.0",
  "core": {
    "id": "medical-imaging-corpus",
    "version": "2.0.0",
    "title": "Medical Imaging Research Corpus",
    "summary": "A curated collection of anonymized medical imaging data for AI research, including CT scans and MRI images with diagnostic annotations from board-certified radiologists.",
    "maintainer": "Healthcare AI Institute",
    "contact": "data-requests@healthcareai.org",
    "doi": "10.5281/zenodo.example123"
  },
  "data": {
    "kind": "real",
    "modalities": ["image"],
    "languages": ["en"],
    "size": {
      "examples": 50000,
      "images": 50000,
      "bytes": 524288000000
    },
    "domains": ["medical"],
    "structures": ["classification-examples"],
    "task_types": ["supervised-finetuning"],
    "record_format": "other",
    "record_format_notes": "DICOM format with accompanying JSON metadata files",
    "has_human_annotations": true,
    "label_types": ["classification-labels", "bounding-boxes"]
  },
  "rights": {
    "license": "custom",
    "license_url": "https://healthcareai.org/data-license",
    "attribution_required": true,
    "allows_commercial_use": false,
    "contains_personal_data": "pseudonymous",
    "consent_mechanism": "All data collected under IRB-approved protocols with patient informed consent. Data has been de-identified per HIPAA Safe Harbor guidelines.",
    "restricted_uses": [
      "Re-identification of patients",
      "Commercial diagnostic applications without regulatory approval",
      "Training models for clinical deployment without additional validation"
    ]
  },
  "provenance": {
    "source_types": ["partner-license"],
    "geography": { "scope": "multi-regional", "regions": ["North America", "Europe"] },
    "collection_start_date": "2018-01-01",
    "collection_end_date": "2023-12-31",
    "collection_notes": "Images collected from partner hospitals under data sharing agreements. All annotations performed by board-certified radiologists with minimum 5 years experience."
  },
  "access": {
    "availability": "on-request",
    "terms_url": "https://healthcareai.org/data-use-agreement",
    "request_instructions": "1. Complete the online application at healthcareai.org/apply\n2. Provide institutional affiliation and IRB approval (if applicable)\n3. Sign the Data Use Agreement\n4. Allow 2-4 weeks for review\n\nApproved researchers receive secure download credentials valid for 90 days."
  },
  "use": {
    "intended_uses": [
      "Medical imaging AI research",
      "Algorithm development and benchmarking",
      "Educational purposes in radiology training"
    ],
    "out_of_scope_uses": [
      "Direct clinical diagnosis without physician oversight",
      "Commercial products without separate licensing",
      "Any attempt to re-identify patients"
    ]
  },
  "governance": {
    "review_status": "audited",
    "last_reviewed": "2024-06-15",
    "documentation_url": "https://healthcareai.org/corpus-documentation"
  },
  "safety": {
    "content_risk_level": "medium",
    "known_risky_categories": ["personal-information"],
    "mitigations": "All images de-identified using HIPAA Safe Harbor method. Facial features removed from head imaging. Metadata scrubbed of identifying information. Annual re-audit of de-identification procedures."
  }
}

Access Configuration

On-Request Availability

For datasets requiring approval:

"access": {
  "availability": "on-request",
  "terms_url": "https://datapass.meetkai.ai/legal/terms",
  "request_instructions": "Detailed instructions..."
}

Access Types Compared

AvailabilityUse CaseRequired Fields
public-downloadOpen dataurl
restrictedApproved users onlyrequest_instructions or url
on-requestCase-by-case approvalrequest_instructions or url
not-availableCannot be accessednot_available_reason

Request Instructions

Provide clear, step-by-step instructions:

"request_instructions": "1. Complete the online application at healthcareai.org/apply\n2. Provide institutional affiliation and IRB approval (if applicable)\n3. Sign the Data Use Agreement\n4. Allow 2-4 weeks for review\n\nApproved researchers receive secure download credentials valid for 90 days."

Personal Data Handling

This dataset contains pseudonymous personal data:

"contains_personal_data": "pseudonymous",
"consent_mechanism": "All data collected under IRB-approved protocols..."

Personal Data Levels

LevelDescriptionRequires Consent?
noneNo personal dataNo
de_minimisMinimal, incidentalNo
pseudonymousIdentifiable but anonymizedYes
directDirectly identifyingYes

Custom File Formats

For non-standard formats, use other and explain:

"record_format": "other",
"record_format_notes": "DICOM format with accompanying JSON metadata files"

Safety Considerations

For sensitive datasets, document risks and mitigations:

"safety": {
  "content_risk_level": "medium",
  "known_risky_categories": ["personal-information"],
  "mitigations": "All images de-identified using HIPAA Safe Harbor method..."
}

Try It