RGB Predicted Depth Simultaneous Localization and Mapping (SLAM) for Outdoor Environment

Brahmanage, Gayan Sampath

RGB Predicted Depth Simultaneous Localization and Mapping (SLAM) for Outdoor Environment

dc.contributor.advisor	Leung, Henry
dc.contributor.author	Brahmanage, Gayan Sampath
dc.contributor.committeemember	Wang, Yingxu
dc.contributor.committeemember	Hu, Yaoping
dc.contributor.committeemember	Bisheban, Mahdis
dc.contributor.committeemember	Gu, Jason
dc.date	2024-05
dc.date.accessioned	2024-04-22T18:42:18Z
dc.date.available	2024-04-22T18:42:18Z
dc.date.issued	2024-04-18
dc.description.abstract	This thesis focuses on visual simultaneous localization and mapping (V-SLAM) for outdoor applications such as autonomous driving. While most V-SLAM methods have been tested on small-scale settings such as mobile robots, applying them in expansive outdoor spaces introduces additional complexities. The larger scale of the environment, dynamic obstacles, and depth-perception limitations of visual sensors pose challenges for V-SLAM methods. The first contribution introduces a dynamic V-SLAM approach. A novel front-end motion tracking approach is developed to recover multiple motions from image frames, considering key-points observed after map initialization as dynamic with time-varying locations. The proposed approach searches for key-point clusters based on their motion and classifies associated motions probabilistically. A bundle adjustment (BA) optimizes the local map, camera trajectory, and key-points motion within a unified V-SLAM system. BA maintains the geometric relationships between dynamic key-points and camera poses in the co-visibility graph, enhancing the overall robustness and accuracy of V-SLAM in populated environments. The second contribution of this thesis centers around a deep-learning-based depth prediction approach, which proves effective for estimating metric scale maps using a monocular camera. An unsupervised depth prediction approach is proposed using a novel convolution vision transformer (CViT) model architecture to infer depth from monocular images. The proposed encoder features a dual CViT block (DCViT); one block generates self-attention solely based on the spatial context of input feature vectors, and the other learns to generate attention based on the scene’s geometry. Contrastive learning of visual representations is applied to DCViT, where the model takes depth predictions from the same model through a feedback path as a supervisory signal to train the DCViT. Integration with residual blocks enables the learning of local and global receptive fields that produce predicted disparity maps at a higher level of detail and accuracy. Experimental results demonstrate significant improvements over state-of-the-art methods across multiple depth datasets. The third contribution of this thesis involves a comprehensive investigation into the utilization of predicted depth within monocular SLAM. This exploration aims to enhance the accuracy of map estimation in metric scale. Most existing approaches struggle with the non-Gaussian distribution inherent in heavy-tail noise produced by depth prediction models. The proposed monocular SLAM approach utilizes t-distribution for ego-motion, with parameter estimation achieved through maximum likelihood (ML) estimation using the expectation maximization (EM) algorithm. Experiments on real data show that the proposed t-distribution renders the monocular SLAM algorithm inherently robust to outliers and heavy-tail noise produced by depth prediction models.
dc.identifier.citation	Brahmanage, G. S. (2024). RGB predicted depth simultaneous localization and mapping (SLAM) for outdoor environment (Doctoral thesis, University of Calgary, Calgary, Canada). Retrieved from https://prism.ucalgary.ca.
dc.identifier.uri	https://hdl.handle.net/1880/118463
dc.language.iso	en
dc.publisher.faculty	Graduate Studies
dc.publisher.institution	University of Calgary
dc.rights	University of Calgary graduate students retain copyright ownership and moral rights for their thesis. You may use this material in any way that is permitted by the Copyright Act or through licensing that has been assigned to the document. For uses that are not allowable under copyright legislation or licensing, you are required to seek permission.
dc.subject	SLAM
dc.subject	3D Perception
dc.subject	Deep Learning
dc.subject	Monocular
dc.subject	Depth Prediction
dc.subject.classification	Robotics
dc.subject.classification	Artificial Intelligence
dc.title	RGB Predicted Depth Simultaneous Localization and Mapping (SLAM) for Outdoor Environment
dc.type	doctoral thesis
thesis.degree.discipline	Engineering – Electrical & Computer
thesis.degree.grantor	University of Calgary
thesis.degree.name	Doctor of Philosophy (PhD)
ucalgary.thesis.accesssetbystudent	I do not require a thesis withhold – my thesis will have open access and can be viewed and downloaded publicly as soon as possible.