FEI, Xiaohan

Smiley face

About me

I’m a Principal Applied Scientist working on multi-modal foundation models (video generation models in particular) at Amazon Artificial General Intelligence (AGI) org. In the past, I was with AWS AI Labs, and Meta Reality Labs where I worked on several initiatives on 3-D computer vision.

I defended my Ph.D. thesis on Inertial-aided Visual Perception of Geometry and Semantics in August 2019. From Sept. 2014 - Sept. 2019, I’ve been working at UCLA Vision Lab with Prof. Stefano Soatto.

My research interests include computer vision, robotics, and machine learning. Specifically, I’m interested in developing models (include, but not limited to, deep learning models) and engineering systems to solve real-world problems in multi-sensor (and multi-modal) settings.

Our paper Geo-Supervised Visual Depth Prediction about leveraging visual-inertial sensor packages, and gravity-induced shape priors to improve depth prediction has won the Best Paper Award in Robot Vision at ICRA 2019, out of 2900 submissions.

Before joining Vision Lab, I've obtained my B.Eng. from Zhejiang University, Hangzhou, China, 2014. I majored in Information and Electronic Engineering in undergraduate, and it was my great pleasure to be a member of the Advanced honor Class of Engineering Education (ACEE), where I practiced a lot in mathematical modeling contests and enjoyed exchanging ideas with other engineering majors.

A recent CV is available here.

The video below shows XIVO – our open-sourced VIO.

Awards & Distinctions

Best Paper Award in Robot Vision, ICRA 2019.
Meritorious Winner of Mathematical Contest in Modeling, 2013.
National Scholarship, Ministry of Education, China.

What’s new

December 2024. We launched Amazon Nova family of models – Amazon’s family of multi-modal foundation models. See it in AWS news blog and the technical report.
July 2021. Our paper on Single View Physical Distance Estimation using Human Pose (pre-print) for the social distancing problem was accepted by ICCV 2021!
February 2021. Our paper on An Adaptive Framework for Learning Unsupervised Depth Completion has been accepted by ICRA 2021.
February 2020. Our paper on Unsupervised Depth Completion from Visual-Inertial Odometry has been accepted by ICRA 2020.
September 2019. We open-sourced our VIO implementation, you can find the code here.

Software

XIVO (X Inertial-aided Visual Odometry) or yet another visual-inertial odometry.[code]
VISMA dataset and utilities for our ECCV paper on Visual-Inertial Object Detection and Mapping. [code]
GeoSup code for our ICRA paper on Geo-Supervised Visual Depth Prediction. [code]
A minimal implementation of \(SE(3)\) (actually \(SO(3)\times \mathbb{R}^3 \)) in Tensorflow for geometric learning. [code]
A collection of PnP (Perspective-n-Point) RANSAC solvers. [code]

Demo

Visual-Inertial Navigation and Semantic Mapping System @ CVPR 2016.
[video]-[poster]
Visual-Inertial Navigation, Mapping and Loop Closure @ SCR 2016.
[video]-[poster]
Re-localization and Failure Recovery for SLAM.
[video]

Thesis

Ph.D. Thesis: Inertial-aided Visual Perception of Geometry and Semantics
[manuscript]-[slides]

B.Eng. Thesis: Robust Wide-baseline Feature Matching for Panoramic Images

Publication

The Amazon Nova Family of Models: Technical Report and Model Card
[arXiv], 2024
Towards visual foundational models of physical scenes
Chethan Parameshwara, Alessandro Achille, Matthew Trager, Xiaolong Li, Jiawei Mo, Ashwin Swaminathan, CJ Taylor, Dheera Venkatraman, Xiaohan Fei, Stefano Soatto
[arXiv], 2023
A Quantitative Evaluation of Score Distillation Sampling Based Text-to-3D
Xiaohan Fei, Chethan Parameshwara, Jiawei Mo, Xiaolong Li, Ashwin Swaminathan, CJ Taylor, Paolo Favaro, Stefano Soatto
[arXiv], 2023
Fast sparse view guided nerf update for object reconfigurations
Ziqi Lu, Jianbo Ye, Xiaohan Fei, Xiaolong Li, Jiawei Mo, Ashwin Swaminathan, Stefano Soatto
[arXiv], 2023
Grounded compositional and diverse text-to-3d with pretrained multi-view diffusion model
Xiaolong Li, Jiawei Mo, Ying Wang, Chethan Parameshwara, Xiaohan Fei, Ashwin Swaminathan, CJ Taylor, Zhuowen Tu, Paolo Favaro, Stefano Soatto
[arXiv], 2023
Single View Physical Distance Estimation using Human Pose
Xiaohan Fei, Henry Wang, Xiangyu Zeng, Lin Lee Cheong, Meng Wang, Joseph Tighe
Accepted by International Conference on Computer Vision (ICCV), 2021
[pre-print]
An Adaptive Framework for Learning Unsupervised Depth Completion
Alex Wong, Xiaohan Fei, Byung-Woo Hong, and Stefano Soatto
In International Conference on Robotics and Automation, 2021
Also in IEEE Robotics and Automation Letters (RA-L), 2021
[paper]
Unsupervised Depth Completion from Visual-Inertial Odometry
Alex Wong*, Xiaohan Fei*, Stephanie Tsuei, and Stefano Soatto
In International Conference on Robotics and Automation, 2020
Also in IEEE Robotics and Automation Letters (RA-L), 2020
[paper]-[code]-[data]-[void benchmark]
Geo-Supervised Visual Depth Prediction
Xiaohan Fei, Alex Wong, and Stefano Soatto
In International Conference on Robotics and Automation, 2019
(Best Paper Award in Robot Vision)
Also in IEEE Robotics and Automation Letters (RA-L), 2019
[paper]-[poster]-[slides]-[code]
Visual-Inertial Object Detection and Mapping
Xiaohan Fei and Stefano Soatto
In Proceedings of European Conference on Computer Vision, 2018
[paper]-[poster]-[video]-[data]-[supmat]
Visual-Inertial-Semantic Scene Representation for 3D Object Detection
Jingming Dong*, Xiaohan Fei*, and Stefano Soatto
In Proceedings of Computer Vision and Pattern Recognition, 2017
[paper]-[poster]-[video]
A Simple Hierarchical Pooling Data Structure for Loop Closure
Xiaohan Fei, Konstantine Tsotsos, and Stefano Soatto
In Proceedings of European Conference on Computer Vision, 2016
[paper]-[poster]

Professional Services

Reviewer of major vision (CVPR, ICCV, and ECCV), and robotics (ICRA, IROS) conferences.

Talk & Workshop

Inertial-aided Visual Perception for Localization, Mapping, and Detection, at Facebook Reality Labs, Microsoft Research, and MagicLeap, 2019.
Visual-Inertial-Semantic Scene Representation, at Bridges to 3D Workshop, CVPR 2017.

Teaching

Spring 2018. CS M152A, Introductory Digital Design Laboratory.
In undergraduate, I was a TA of the graduate class taught by Prof. Xianyi Gong at Zhejiang University on Spectral Analysis of Signals – mostly solving problem sets of the Linear Estimaion book, and leading discussions.