RGB Predicted Depth Simultaneous Localization and Mapping (SLAM) for Outdoor Environment

Date
2024-04-18
Journal Title
Journal ISSN
Volume Title
Publisher
Abstract
This thesis focuses on visual simultaneous localization and mapping (V-SLAM) for outdoor applications such as autonomous driving. While most V-SLAM methods have been tested on small-scale settings such as mobile robots, applying them in expansive outdoor spaces introduces additional complexities. The larger scale of the environment, dynamic obstacles, and depth-perception limitations of visual sensors pose challenges for V-SLAM methods. The first contribution introduces a dynamic V-SLAM approach. A novel front-end motion tracking approach is developed to recover multiple motions from image frames, considering key-points observed after map initialization as dynamic with time-varying locations. The proposed approach searches for key-point clusters based on their motion and classifies associated motions probabilistically. A bundle adjustment (BA) optimizes the local map, camera trajectory, and key-points motion within a unified V-SLAM system. BA maintains the geometric relationships between dynamic key-points and camera poses in the co-visibility graph, enhancing the overall robustness and accuracy of V-SLAM in populated environments. The second contribution of this thesis centers around a deep-learning-based depth prediction approach, which proves effective for estimating metric scale maps using a monocular camera. An unsupervised depth prediction approach is proposed using a novel convolution vision transformer (CViT) model architecture to infer depth from monocular images. The proposed encoder features a dual CViT block (DCViT); one block generates self-attention solely based on the spatial context of input feature vectors, and the other learns to generate attention based on the scene’s geometry. Contrastive learning of visual representations is applied to DCViT, where the model takes depth predictions from the same model through a feedback path as a supervisory signal to train the DCViT. Integration with residual blocks enables the learning of local and global receptive fields that produce predicted disparity maps at a higher level of detail and accuracy. Experimental results demonstrate significant improvements over state-of-the-art methods across multiple depth datasets. The third contribution of this thesis involves a comprehensive investigation into the utilization of predicted depth within monocular SLAM. This exploration aims to enhance the accuracy of map estimation in metric scale. Most existing approaches struggle with the non-Gaussian distribution inherent in heavy-tail noise produced by depth prediction models. The proposed monocular SLAM approach utilizes t-distribution for ego-motion, with parameter estimation achieved through maximum likelihood (ML) estimation using the expectation maximization (EM) algorithm. Experiments on real data show that the proposed t-distribution renders the monocular SLAM algorithm inherently robust to outliers and heavy-tail noise produced by depth prediction models.
Description
Keywords
SLAM, 3D Perception, Deep Learning, Monocular, Depth Prediction
Citation
Brahmanage, G. S. (2024). RGB predicted depth simultaneous localization and mapping (SLAM) for outdoor environment (Doctoral thesis, University of Calgary, Calgary, Canada). Retrieved from https://prism.ucalgary.ca.