The existing datasets for evaluating Visual Inertial Odometry (VIO) have boosted the research of autonomous agents, but they don’t meet the prosperous research of Augmented Reality (AR) or Mixed Reality (MR) given that they are not collected at real AR scenes and do not account for affecting factors of mobile devices. This paper presents the NEAR dataset, an AR oriented visual-inertial dataset collected with commodity handheld phones with ground truthrts. The dataset has a total of 113 sequences in 49 elaborately designed collection cases at two typical indoor scenes, i.e. the living area and the table area. It also covers plenty of setting adjustments for comparison, including the comparisons of different level textures, illuminations, motion patterns, camera settings and the difference between the rolling shutter and the global shutter.
To enable evaluation of VIO on the NEAR dataset, we also provide the calibration parameters along with the dataset. Here, we brief the calibration procedure of the intrinsics and extrinsics.
The calibration sequences are recorded with the phone camera viewing towards a 12×8 chessboard with a grid size of 50mm×50mm for camera intrinsics. Afterward, the sufficient high-quality images are chosen empirically to feed to MATLAB calibration toolbox with the 4 parametric radial and tangential distortion model. Then the intrinsics of all phone cameras and the MYNT camera are calibrated with respective calibration sequences for once since the autofocus mode is turned off during the complete data collection except for the comparison cases. As for the intrinsics of IMU, we follow to calibrate the variances of noise, bias and random walk of both gyroscope and accelerometer of each phone via Allan Variance Tool5.
There are 3 moving coordinate frames but relatively invariant with each other in data collection including the phone camera frame C, the phone IMU frame B and the rig frame R. Related 2 fixed frames are the frame of the ChArUco board6 W and the global frame of motion capture system G. The whole frame system is shown in figure 3. Firstly, IMU-camera extrinsic i.e. Tbc and time offset were calibrated with Kalibr 7 . The camera-rig transformation is also needed since the motion capture system provides G R T, but the VIO can only provide W C T. Fol lowing the identical transformation
We design a calibration sequence to get the relationship of transfor mation. First, the 6×8 ChArUco board (square length of 48mm and marker length of 36mm) is set in the test fifield. Then we move the rig slowly with the camera facing towards the board while recording images and the ground truth G R Tj simultaneously. Afterward, the chessboard corners in every camera images are detected and the camera poses W C Ti are computed with the PnP solver. Finally, we form the transformation issue as the least square problem
where j(i) is the corresponding index of i aligned on the times tamp. We solve this optimization problem with the alternate iteration method. In every iteration, we have two steps. Firstly, we solve the best G W T with the fifixed R C T from the last iteration or prior identity SE(3) according to Umeyama method . Secondly, we fifix the ‘best’ G W T and solve R C T by a similar approach with the fifirst step. The iteration is terminated until the error converges.
The average precision rate and F-score are used for evaluation. A detection is correct only if the intersection of unit (IoU) between the detected bounding box and any of the ground truth box with the same class is larger than 0.33. F-score is calculated as: F-score=2PR/(P+R), where P and R is the precision and recall rate. Notice that F-score is threshold-sensitive, which means you could adjust your score threshold to obtain a better result. Although F-score is not as fair as the mAP criteria but more practical since a threshold should always be given when deploying the model and not all of the algorithms have a score evaluation for the target. Thus, F-score and mAP are both under consideration in the benchmarks.
The evaluation script for mAP and F-score are borrowed from Icdar2015 evaluation scripts with small modification (You may first register an account.). Here, we give the modified evaluation scripts and the ground truth gt.zip file of the test set in evaluation/ directory. You can evaluate your own method by following instructions: