graviti logoProductOpen DatasetsAbout
Request DemoSign in
153
0
0
General
Discussion
Code
Activities
c77be014-8cd1-11eb-88ae-0e1f58d5e9a9
9935bf4·
Jun 20, 2021 2:26 PM
·1Commits

Overview

C3 is the first free-form multiple-Choice Chinese machine reading Comprehension dataset, containing 13,369 documents (dialogues or more formally written mixed-genre texts) and their associated 19,577 multiple-choice free-form questions collected from Chinese-as-a-second language examinations.

Data Format

data/c3-{m,d}-{train,dev,test}.json: the dataset files, where m and d represent "mixed-genre" and "dialogue", respectively. The data format is as follows.

[
  [
    [
      document 1
    ],
    [
      {
        "question": document 1 / question 1,
        "choice": [
          document 1 / question 1 / answer option 1,
          document 1 / question 1 / answer option 2,
          ...
        ],
        "answer": document 1 / question 1 / correct answer option
      },
      {
        "question": document 1 / question 2,
        "choice": [
          document 1 / question 2 / answer option 1,
          document 1 / question 2 / answer option 2,
          ...
        ],
        "answer": document 1 / question 2 / correct answer option
      },
      ...
    ],
    document 1 / id
  ],
  [
    [
      document 2
    ],
    [
      {
        "question": document 2 / question 1,
        "choice": [
          document 2 / question 1 / answer option 1,
          document 2 / question 1 / answer option 2,
          ...
        ],
        "answer": document 2 / question 1 / correct answer option
      },
      {
        "question": document 2 / question 2,
        "choice": [
          document 2 / question 2 / answer option 1,
          document 2 / question 2 / answer option 2,
          ...
        ],
        "answer": document 2 / question 2 / correct answer option
      },
      ...
    ],
    document 2 / id
  ],
  ...
]

Citation

@article{sun2019investigating,
  title={Investigating Prior Knowledge for Challenging Chinese Machine Reading Comprehension},
  author={Sun, Kai and Yu, Dian and Yu, Dong and Cardie, Claire},
  journal={Transactions of the Association for Computational Linguistics},
  year={2020},
  url={https://arxiv.org/abs/1904.09679v3}
}
🎉Many thanks to Graviti Open Datasets for contributing the dataset
Basic Information
Application ScenariosNot Available
AnnotationsNot Available
TasksNot Available
LicenseCustom
Updated on2021-01-20 03:27:11
Official Websitehttps://dataset.org/c3/
Metadata
Data TypeNot Available
Data Volume19.58K
Annotation Amount0
File Size0B
Copyright Owner
dataset.org
Annotator
Unknown
More Support Options
Start building your AI now
Get StartedContact