Xiaohan Fei, Henry Wang, Xiangyu Zeng, Lin-Lee Cheong, Meng Wang, Joseph Tighe

Amazon Web Services

We propose a fully automated system that simultaneously estimates the camera intrinsics, the ground plane, and physical distances between people from a single RGB image or video captured by a camera viewing a 3-D scene from a fixed vantage point. To automate camera calibration and distance estimation, we leverage priors about human pose and develop a novel direct formulation for pose-based auto-calibration and distance estimation, which shows state-of-the-art performance on publicly available datasets. The proposed approach enables existing camera systems to measure physical distances without needing a dedicated calibration process or range sensors, and is applicable to a broad range of use cases such as social distancing and workplace safety. Furthermore, to enable evaluation and drive research in this area, we contribute to the publicly available MEVA dataset with additional distance annotations, resulting in MEVADA – an evaluation benchmark for the pose-based auto-calibration and distance estimation problem.


  • The pre-print version of this paper can be found here
  • The dataset MEVADA will be released after legal review.