PlatformMarketplaceSolutionsResourcesOpen DatasetsCommunityCompany
update dataset overview and ba...
Feb 10, 2022 9:11 AM


The Taskmaster-2 dataset consists of 17,289 dialogs in the seven domains below. Dialogs for each domain can be found in the seven json files located in this directory's "data" folder, i.e. Taskmaster/TM-2-2-20/data/.

  • restaurants (3276)
  • food ordering (1050)
  • movies (3047)
  • hotels (2355)
  • flights (2481)
  • music (1602)
  • sports (3478)

Data Collection

Unlike Taskmaster-1, which includes both written "self-dialogs" and spoken two-person dialogs, Taskmaster-2 consists entirely of spoken two-person dialogs. In addition, while Taskmaster-1 is almost exclusively task-based, Taskmaster-2 contains a good number of search- and recommendation-oriented dialogs, as seen for example in the restaurants, flights, hotels, and movies verticals. The music browsing and sports conversations are almost exclusively search- and recommendation-based. All dialogs in this release were created using a Wizard of Oz (WOz) methodology in which crowdsourced workers played the role of a 'user' and trained call center operators played the role of the 'assistant'. In this way, users were led to believe they were interacting with an automated system that “spoke” using text-to-speech (TTS) even though it was in fact a human behind the scenes. As a result, users could express themselves however they chose in the context of an automated interface.

Data Format

Each conversation in the data file has the following structure:

  • conversation_id: A universally unique identifier with the prefix 'dlg-'. The ID has no meaning.
  • utterances: An array of utterances that make up the conversation.
  • instruction_id: A reference to the file(s) containing the user (and, if applicable, agent) instructions for this conversation.

Each utterance has the following fields:

  • index: A 0-based index indicating the order of the utterances in the conversation.
  • speaker: Either USER or ASSISTANT, indicating which role generated this utterance.
  • text: The raw text of the utterance. 'ASSISTANT' turns are originally written (then played to the user via TTS) and 'USER' turns are transcribed from the spoken recordings of crowdsourced workers.
  • segments: An array of various text spans with semantic annotations.

Each segment has the following fields:

  • start_index: The position of the start of the annotation in the utterance text.
  • end_index: The position of the end of the annotation in the utterance text.
  • text: The raw text that has been annotated.
  • annotations: An array of annotation details for this segment.

Each annotation has a single field:

  • name: The annotation name.


As with Taskmaster-1, crowdsourced workers and agents alike were given instructions prior to being connected in order to set up the role play scenario and to explain additional details. In most cases users and assistants alike were given free reign to make the conversation as realistic and typical as possible by basing their ideas and results on real flight data from the Internet instead of being restricted to a small knowledge base. However, in some cases certain variables such as location were restrcited to a handful of choices to make it easier for the worker playing 'assistant' to respond results in a rasonable time frame. In other words, knowing ahead of time which cities are in play for a given search makes it easier to anticipate and search for results. Note that, even though dialogs for each domain are consolidated in just one json file, there are actually many versions of each type of dialog which correspond to slight variations in the instructions. You can identify each set within a given domain by searching for "instruction_id". For example, in food ordering Below we give additional details about the instructions for each domain:

  • Restaurants: Most dialogs in this set involve the user asking for recommendations for a particular type of cuisine in a given city. Users were asked to ask for 2-3 choices up front and then compare them by asking the assistant questions about price, atmosphere, menu items and the like.

  • Food ordering: Users pretended they were ordering take-out using an automated assistant for a particular cuisine choice. (One cuisine was randomly generated for each set of instruction to ensure variety.) They discussed dishes or items, size, toppings, etc.

  • Hotels: As with flights below, users choose from a list of cities to anchor their search. They are encouraged to compare several different hotels using typical preferences such as ratings, price, availability, and other amenities.

  • Flights: Users typically choose from a list of cities or continents for round trip or multi-city flights and based their final choice on a number of additional preferences such as dates, flight time, price, layovers, seating class, airline, etc.

  • Movies

    : This domain consists mostly of recommendation dialogs where users are trying to find a movie to watch in theaters or using a streaming service at home.

    • In theaters: instruction_id = movie-{2-7, 9-12 14-15, 17, 20-22, 24e, 25-26, 30, 31e, 33e}
    • At home: instruction_id = movie-{8, 18, 19, 27-29}
  • Music: For the music domain, users were asked to browse (i.e. listen to) several tracks based on their choice of artist, track, album, or genre, and then comment on each one. To recommend tracks, agents were able to send users Youtube videos which they in turn could control once launched. The dialogs do not include when the links, however.

  • Sports: Sports dialogs are informational conversations discussing facts and stats about players, teams, games, etc. in the English Premiere League (EPL), Major Leage Baseball, Major League Soccer, National Basketball Association, and National Football League. Dialogs for each can be selected by the following prefixes: league epl, mlb, mls, nba, nfl.


Please use the following citation when referencing the dataset:

    title = {Taskmaster-1:Toward a Realistic and Diverse Dialog Dataset},
    author = {Bill Byrne and Karthik Krishnamoorthi and Chinnadhurai Sankar and Arvind Neelakantan
and Daniel Duckworth and Semih Yavuz and Ben Goodrich and Amit Dubey and Kyu-Young Kim and
Andy Cedilnik},
    booktitle = {2019 Conference on Empirical Methods in Natural Language
Processing and 9th International Joint Conference on Natural Language Processing},
    address = {Hong Kong},
    year = {2019}
🎉Many thanks to Graviti Open Datasets for contributing the dataset
Basic Information
Application ScenariosNot Available
AnnotationsNot Available
TasksNot Available
LicenseCC BY 4.0
Updated on2022-02-10 09:11:54
Data TypeNot Available
Data Volume17,289
Annotation Amount0
File Size0.00B
Copyright Owner
Google Research