Access Section

The access section describes how users can obtain the dataset. Clear access information helps users determine if they can actually use the dataset for their needs before investing time in evaluation.

Required Fields

FieldTypeDescription
availabilityenumHigh-level availability status

Optional Fields

FieldTypeDescription
urlstring (uri)Landing page or primary download URL
terms_urlstring (uri)URL to dataset-specific terms of use
request_instructionsstringHow to request access (for restricted datasets)
not_available_reasonstringWhy the dataset is not available

Enum Values

availability

ValueDescription
public-downloadFreely downloadable without authentication
restrictedAvailable with restrictions (e.g., registration, affiliation)
on-requestAvailable upon request and approval
not-availableNot currently available for distribution

Conditional Rules

When…You must provide…
availability is “restricted” or “on-request”Either request_instructions or url
availability is “not-available”not_available_reason
availability is “public-download”Either access.url, artifacts.base_uri, or artifacts.files[].uri

Examples

Public Download

{
  "access": {
    "availability": "public-download",
    "url": "https://huggingface.co/datasets/example/hausa-news",
    "terms_url": "https://example.org/data-terms"
  }
}

Restricted Access

{
  "access": {
    "availability": "restricted",
    "url": "https://example.org/datasets/apply",
    "request_instructions": "Submit application form with institutional affiliation. Access granted to verified researchers within 5 business days.",
    "terms_url": "https://example.org/research-agreement"
  }
}

On-Request

{
  "access": {
    "availability": "on-request",
    "request_instructions": "Email data-access@example.org with research proposal and IRB approval. Review takes 2-4 weeks."
  }
}

Not Available

{
  "access": {
    "availability": "not-available",
    "not_available_reason": "Dataset under community review. Expected availability Q2 2025."
  }
}

Field Details

availability

Choose the most accurate status:

  • public-download: Anyone can download immediately
  • restricted: Access requires meeting criteria (registration, affiliation, agreement)
  • on-request: Manual approval process required
  • not-available: Dataset exists but cannot be distributed

url

The primary URL should point to:

  1. A landing page with download links, OR
  2. A direct download link (for single-file datasets), OR
  3. A data repository page (Hugging Face, Zenodo, etc.)

Supported URL patterns:

  • https://huggingface.co/datasets/...
  • https://zenodo.org/records/...
  • https://data.example.org/...
  • s3://bucket/path/... (for cloud storage)

request_instructions

Make instructions actionable. Include:

  1. Contact method - Email, form URL, or portal
  2. Required information - What applicants must provide
  3. Eligibility criteria - Who can request access
  4. Timeline - Expected response time
{
  "request_instructions": "1. Complete form at https://example.org/data-request\n2. Provide institutional email and research proposal\n3. Eligible: Academic researchers and nonprofit organizations\n4. Response within 10 business days"
}

not_available_reason

Be specific about why and when (if known):

  • "Under legal review for public release"
  • "Community consultation in progress"
  • "Contains sensitive content requiring additional safeguards"
  • "Deprecated - superseded by v2.0"

Connecting to Artifacts

For datasets with availability: "public-download", you should either:

  1. Provide access.url pointing to download location, OR
  2. Populate the artifacts section with file-level details

The artifacts section provides checksums and file metadata for reproducible downloads:

{
  "access": {
    "availability": "public-download"
  },
  "artifacts": {
    "base_uri": "https://cdn.example.org/datasets/hausa-news/v1/",
    "files": [
      {
        "path": "train.jsonl",
        "sha256": "a1b2c3...",
        "size_bytes": 524288000,
        "split": "train"
      }
    ]
  }
}

See Also