OK-VQA is a new dataset for visual question answering that requires methods which can draw upon outside knowledge to answer questions.
The questions are stored using the JSON file format.
The questions format has the following data structure:
{
"info" : info,
"task_type" : str,
"data_type": str,
"data_subtype": str,
"questions" : [question],
"license" : license
}
info {
"year" : int,
"version" : str,
"description" : str,
"contributor" : str,
"url" : str,
"date_created" : datetime
}
license{
"name" : str,
"url" : str
}
question{
"question_id" : int,
"image_id" : int,
"question" : str
}
The annotations are stored using the JSON file format.
The annotations format has the following data structure:
{
"info" : info,
"data_type": str,
"data_subtype": str,
"annotations" : [annotation],
"license" : license
}
info {
"year" : int,
"version" : str,
"description" : str,
"contributor" : str,
"url" : str,
"date_created" : datetime
}
license{
"name" : str,
"url" : str
}
annotation{
"question_id" : int,
"image_id" : int,
"question_type" : str,
"answer_type" : str,
"answers" : [answer],
"multiple_choice_answer" : str
}
answer{
"answer_id" : int,
"answer" : str,
"answer_confidence": str
}
data_type: source of the images (mscoco or abstract_v002).
data_subtype: type of data subtype (e.g. train2014/val2014/test2015 for mscoco, train2015/val2015 for abstract_v002).
question_type: type of the question determined by the first few words of the question. For details, please see README.
answer_type: type of the answer. Currently, "yes/no", "number", and "other".
multiple_choice_answer: most frequent ground-truth answer.
answer_confidence:
subject's confidence in answering the question. For details, please see Antol et al., ICCV 2015.