Multilingual QA Dataset | سجل مجموعات البيانات

A hybrid question-answering dataset combining real-world questions with synthetic answers, covering 10 languages with emphasis on factual knowledge and reading comprehension.

Back to Registry

Multilingual QA Dataset

hybrid

multilingual-qa v1.0.0 | Maintained by Global AI Research Consortium

Access Dataset

A hybrid question-answering dataset combining real-world questions with synthetic answers, covering 10 languages with emphasis on factual knowledge and reading comprehension.

Examples
250,000
Languages
10
License
CC-BY-NC-SA-4.0
Availability
restricted

Core Information

ID
multilingual-qa
Version
1.0.0
Maintainer
Global AI Research Consortium
Contact
datasets@globalairc.org

Access Information

Availability
restricted
Request Instructions
Submit a data access request form at the dataset URL. Academic researchers typically approved within 48 hours. Commercial entities require a separate license agreement.

Provenance

Source Types
crowdsourcedsynthetic-generation
Geography
global
Collection Period
2024-01-01 - 2024-10-15
Notes
Questions were crowdsourced from native speakers. A subset of answers were generated using GPT-4 and validated by human annotators.

Intended Use

Intended Uses
  • Multilingual QA system training
  • Reading comprehension research
  • Cross-lingual transfer learning
Out of Scope
  • Medical or legal advice systems
  • Production systems without human oversight