PlatformMarketplaceSolutionsResourcesOpen DatasetsCommunityCompany
Jun 20, 2021 10:26 PM


Appearance-based gaze estimation is believed to work well in real-world settings, but existing datasets have been collected under controlled laboratory conditions and methods have been not evaluated across multiple datasets. In this work we study appearance-based gaze estimation in the wild. We present the MPIIGaze dataset that contains 213,659 images we collected from 15 participants during natural everyday laptop use over more than three months. Our dataset is significantly more variable than existing ones with respect to appearance and illumination. We also present a method for in-the-wild appearance-based gaze estimation using multimodal convolutional neural networks that significantly outperforms state-of-the art methods in the most challenging cross-dataset evaluation. We present an extensive evaluation of several state-of-the-art image-based gaze estimation algorithms on three current datasets, including our own.

Data Collection

We implemented custom software running as a background service on participants’ laptops. Every 10 minutes the software automatically asked participants to look at a random sequence of 20 on-screen positions (a recording session), visualized as a grey circle shrinking in size and with a white dot in the middle. Participants were asked to fixate on these dots and confirm each by pressing the spacebar once the circle was about to disappear. This was to ensure participants concentrated on the task and fixated exactly at the intended on-screen positions. No other instructions were given to them, in particular no constraints as to how and where to use their laptops.


Data Annotation

We collected a total of 213,659 images from 15 participants. The number of images collected by each participant varied from 34,745 to 1,498.
The following figure shows the the collected samples across different factors, including: percentage of images having different mean grey-scale intensities within the face region (top left), having horizontally different mean grey-scale intensities from the left to right half of the face region (to right), collected at different times over the day (bottom left), and collected by each participants. Some figures with representative samples at the top.


Data Format

The dataset contains three parts: "Data", "Evaluation Subset" and "Annotation subset".
The "Data" folder includes "Original" and "Normalized" for all the 15 participants. You can also find the 6 points-based face model we used in this dataset.
The "Original" folders are the cropped eye rectangle images with the detection results based on face detector and facial landmark detector. For each participants, the images and annotations are organized by days. For each day's folder, there are the image collected by that participants and corresponding "annotation.txt" files. The annotations includes:

  • Dimension 1~24: Detected eye landmarks position in pixel on the whole image coordinate
  • Dimension 25~26: On-screen gaze target position in screen coordinate
  • Dimension 27~29: 3D gaze target position related to camera
  • Dimension 30~35: The estimated 3D head pose based on 6 points-based 3D face model, rotation and translation
  • Dimension 36~38: The estimated 3D right eye center in the camera coordiante system.
  • Dimension 39~41: The estimated 3D left eye center in the camera cooridnate system.

Besides, there is also "Calibration" folder for each participants, which contains:

  • Camera.mat: the intrinsic parameter of the laptop camera. "cameraMatrix": the projection matrix of the camera. "distCoeffs": camera distortion coefficients. "retval": root mean square (RMS) re-projection error. "rvecs": the rotation vectors. "tvecs": the translation vectors.
  • monitorPose.mat: the position of image plane in camera coordinate. "rvecs": the rotation vectors. "tvecs": the translation vectors.
  • creenSize.mat: the laptop screen size. "height_pixel": the screen height in pixel. "width_pixel": the screen width in pixel. "height_mm": the screen height in millimeter. "width_mm": the screen widht in millimeter.

The "Normalized" folders are the eye patch images after the normalization that canceling scaling and rotation via perspective transformation in Sugano et al. Similar to the "Original" folders, all the data are organized by each days for each participants, and the file format is ".mat". The annotation includes:

  • 3D gaze head pose and 3D gaze direction. The generation of 2D screen gaze target to this 3D gaze direction is described in our paper.

The folder "Evaluation Subset'' contains:

  • The image list that indicates the selected samples for the evaluation subset in our paper. We performed evaluations on this evaluation subset of our MPIIGaze dataset, which includes equal number of samples for each participants.

The folder "Annotation Subset" contains:

  • The image list that indicates 10,848 samples that we manually annotated

  • Following the annotations with (x, y) position of 6 facial landmarks (four eye corners, two mouth corners) and (x,y) position of two pupil centers for each of above images. The comparison of the original eye rectangle and normalized eye patch is shown in the following figure(Left: Original eye rectangle image (720 x 1280 pixel), Right: Normalized eye patch image (36 x 60 pixel)).



  title={Appearance-based gaze estimation in the wild},
  author={Zhang, Xucong and Sugano, Yusuke and Fritz, Mario and Bulling, Andreas},
  booktitle={Proceedings of the IEEE conference on computer vision and pattern recognition},
🎉Many thanks to Graviti Open Datasets for contributing the dataset
Basic Information
Application ScenariosNot Available
AnnotationsNot Available
TasksNot Available
LicenseCC BY-NC-SA 4.0
Updated on2021-01-20 04:33:52
Data TypeNot Available
Data Volume0
Annotation Amount0
File Size0.00B
Copyright Owner
Max Planck Institute for Informatics