Integrating Computer Vision and Inertial Navigation for Pedestrian Navigation - GPS News
Integrating Computer Vision and Inertial Navigation for Pedestrian Navigation

Chris Hide, Terry Moore and Marcus Andreotti, IESSG, University of Nottingham
Tom Botterill, University of Canterbury, New Zealand


Low cost MEMS gyros and accelerometers are often considered as a potential solution for indoor navigation, but the reality is that such sensors are only sufficient to provide useful positioning for very short periods of time. This is due to sensor errors such as biases and scale factor errors that are not necessarily constant over time. Low cost MEMS sensors have recently been demonstrated to provide useful levels of performance through innovative ideas such as mounting an Inertial Measurement Unit (IMU) on a user’s foot and using zero velocity updates every time a user takes a step. Another form of frequent measurements to correct the IMU comes from using computer vision. Cameras can be used to provide measurements such as translation and rotation between frames by tracking features contained in the image. This article builds on work from Hide, C., Botterill, T., & Andreotti, M., 2009, and looks at the use of aiding measurements from a camera attached to an IMU where the user is walking with the device held out in front of them with the camera pointing towards the ground. Some smartphones already contain these sensors so this method could be a viable form of navigation in the future.

Computer Vision

A camera is used to capture a sequence of images where the camera is approximately looking at the ground as if the user were holding a smartphone. For each image, the relative position and orientation of the camera is estimated relative to the previous image. An assumption is made that the camera is looking at an approximate planar surface, although the images will also contain features such as the pedestrian’s moving legs, feet and shadow that needs to be excluded from the estimation. The algorithm consists of the following steps:

  1. An image is captured and approximately 300 point features are found, with a small patch of size 9×9 pixels being extracted around each point.
  2. Each patch feature is matched to the most similar patch feature in the previous image by computing the sum-of-squared differences between all pairs of patch features and choosing the closest match to each.
  3. A perspective homography matrix is computed assuming that the point locations lie on a plane using a Discrete Linear Transform (DLT). The RANSAC framework is used to remove outliers caused by shadows or features being out of plane such as the user’s legs.  The BaySAC framework is used to optimise the algorithm if multiple correspondences are found.
  4. Once a homography and inlier set are found, the set is refined by using the DLT to fit a new homography to all of the inliers found and re-computing which points are compatible with the new model.
  5. The homography matrix is decomposed to compute the translation and rotation between images. The translation can be scaled if the height of the camera above ground is known to generate a three-dimensional velocity measurement in the camera frame.

Computer Vision and IMU Integration

An Inertial Measurement Unit (IMU) is integrated with the computer vision component to compute the position and orientation of the unit. The IMU measurements are numerically integrated to generate estimates of position, velocity and attitude. A Kalman filter is used to estimate the navigation and IMU errors over time using measurements from other sensors including GPS when it is available. The state vector estimates the position, velocity, attitude and sensor biases using a linearized inertial navigation model such as that described in Titterton, D. H.,  & Weston, J. L., 1997. The Kalman filter is used in feedback form so that errors are kept small.

In order to use the computer vision measurements to correct the drift of the IMU, it is necessary to develop observation equations that relate the computer vision measurements to the INS error states modeled in the Kalman filter. The camera height error is estimated in the Kalman filter as a scale factor. The camera velocity measurement is rotated to the navigation frame using the attitude of the IMU and is differenced with the navigation frame velocity from the INS. The error model is given as:

where  is the direction cosine matrix from the navigation frame to the body frame;  is the differenced INS and camera velocity measurements in the navigation frame; v is the velocity (in navigation or body frames);  is the attitude error, is the scale factor error for the camera height and is the measurement noise.

Field Trial

A trial was conducted at the University of Nottingham, UK in September 2010. A Microstrain 3DM-GX3 IMU was used with a gyro bias stability of approximately ±0.2deg/s and accelerometer biases of 0.01g. The IMU was fixed to a consumer Canon IXUS 750 digital camera. The IMU and camera were held in the user’s hand to provide the equivalent to a user walking holding a smartphone. Pseudorange and carrier phase measurements from a u-blox ANTARIS 4 GPS receiver were used to compute a carrier phase GPS position solution. A dataset was collected around a field at the University of Nottingham in clear open sky environments so that the GPS measurements could be used to assess the position accuracy. The data was processed using software created by the IESSG and University of Canterbury.


Figure 1 shows the trajectory from the integrated computer vision/IMU/GPS solution. The figure shows the reference trajectory in blue generated from integrating all available GPS position and velocity measurements with the Microstrain IMU. This is compared to the red line which shows a trajectory intialized using GPS measurements for the first 3 minutes 20 seconds. After this, the GPS measurements are removed and the integrated IMU and computer vision algorithm continues until GPS measurements are used again 3 minutes later. The total trajectory lasts 9 minutes.

The figure shows that the computer vision aided low cost IMU is able to bridge the 3-minute gap relatively well. TABLE I quantifies the results obtained using IMU-only and IMU with computer vision during the outage. Here we see that there is a substantial position accuracy improvement from an unusable 3740m after 3 minutes using IMU-only, to a useful 13m using computer vision aiding.

Figure 1. Comparison of integrated GPS/IMU/Vision solution to reference trajectory.

Table 1. Comparison of horizontal position errors with and without computer vision aiding during GPS outage.

Figure 2 shows an example of the computer vision algorithm in operation where the red dots indicate detected features; blue lines show inlier correspondences and red lines show the outlier correspondences. The inliers are used to compute the relative motion vector of the camera. The rotational information is not currently used.

Figure 2. Features (red dots), inlier correspondences (blue lines) and outliers (red lines) marked on example frame


The results shown are very promising for providing accurate positioning through integrating an IMU, computer vision and GPS when available. However, there are some issues that must be addressed before this approach can be used for indoor navigation:

  1. The position, velocity and attitude must be initialized, the main problems here are that position is required (typically from GPS) and heading is required (typically from a magnetometer although this is usually unreliable).
  2. The computer vision algorithm is computer intensive. Some modifications in the algorithm may significantly improve this, specifically through using the IMU information in the computer vision algorithm. In particular, the IMU can be used for outlier detection.
  3. The main limitation is that using the camera in low light conditions causes significant motion blur since the relative motion of the camera with respect to the ground is relatively fast. Future improvements in camera technology are required to make this approach viable. Further research will investigate the use of different cameras.


From the results presented, its clear that computer vision measurements are able to significantly reduce the drift of the low cost IMU when GPS measurements are unavailable. The algorithm has the significant advantage that it uses existing sensors from modern smartphones which is desirable since there must be a strong business model for any other technologies to be added such devices. The algorithm was tested with real-world data and has been shown to be robust in outdoor situations, however its recognized that developments in imaging sensor technology are required for this to be a viable solution indoors. If this issue can be addressed, it is clear that this approach has significant potential as a solution alongside existing technologies such as GPS and Wi-Fi.

This article is based on Hide, C., Botterill, T., Andreotti, M., 2010. Low cost vision-aided IMU for pedestrian navigation. In Proceedings of UPINLBS 2010, Kirkkonummi, Finland.


Hide, C., Botterill, T., Andreotti, M., 2009. An integrated IMU, GNSS and image recognition sensor for pedestrian navigation. In Proceedings of the Institute of Navigation GNSS 2009 Conference. Fort Worth, Texas.

Hide, C., Botterill, T., Andreotti, M., 2010. Vision-aided IMU for handheld pedestrian navigation. In Proceedings of the Institute of Navigation GNSS 2010 Conference. Portland, Oregon.

Hartley, R., Zisserman, A., 2003. Multiple View Geometry in Computer Vision 2nd Edition. Cambridge University Press.

Botterill, T., Mills, S., Green, R., 2009. New Conditional Sampling Strategies for Speeded-Up RANSAC. In Proceedings of the British Machine Vision Conference.

Titterton, D. H., Weston, J. L., 1997. Strapdown Inertial Navigation Technology. Institution of Electrical Engineers.
Hide, C., Botterill, T., Andreotti, M., 2010. Low cost vision-aided IMU for pedestrian navigation. In Proceedings of UPINLBS 2010, Kirkkonummi, Finland.