RGB Predicted Depth Simultaneous Localization and Mapping (SLAM) for Outdoor Environment

dc.contributor.advisorLeung, Henry
dc.contributor.authorBrahmanage, Gayan Sampath
dc.contributor.committeememberWang, Yingxu
dc.contributor.committeememberHu, Yaoping
dc.contributor.committeememberBisheban, Mahdis
dc.contributor.committeememberGu, Jason
dc.date2024-05
dc.date.accessioned2024-04-22T18:42:18Z
dc.date.available2024-04-22T18:42:18Z
dc.date.issued2024-04-18
dc.description.abstractThis thesis focuses on visual simultaneous localization and mapping (V-SLAM) for outdoor applications such as autonomous driving. While most V-SLAM methods have been tested on small-scale settings such as mobile robots, applying them in expansive outdoor spaces introduces additional complexities. The larger scale of the environment, dynamic obstacles, and depth-perception limitations of visual sensors pose challenges for V-SLAM methods. The first contribution introduces a dynamic V-SLAM approach. A novel front-end motion tracking approach is developed to recover multiple motions from image frames, considering key-points observed after map initialization as dynamic with time-varying locations. The proposed approach searches for key-point clusters based on their motion and classifies associated motions probabilistically. A bundle adjustment (BA) optimizes the local map, camera trajectory, and key-points motion within a unified V-SLAM system. BA maintains the geometric relationships between dynamic key-points and camera poses in the co-visibility graph, enhancing the overall robustness and accuracy of V-SLAM in populated environments. The second contribution of this thesis centers around a deep-learning-based depth prediction approach, which proves effective for estimating metric scale maps using a monocular camera. An unsupervised depth prediction approach is proposed using a novel convolution vision transformer (CViT) model architecture to infer depth from monocular images. The proposed encoder features a dual CViT block (DCViT); one block generates self-attention solely based on the spatial context of input feature vectors, and the other learns to generate attention based on the scene’s geometry. Contrastive learning of visual representations is applied to DCViT, where the model takes depth predictions from the same model through a feedback path as a supervisory signal to train the DCViT. Integration with residual blocks enables the learning of local and global receptive fields that produce predicted disparity maps at a higher level of detail and accuracy. Experimental results demonstrate significant improvements over state-of-the-art methods across multiple depth datasets. The third contribution of this thesis involves a comprehensive investigation into the utilization of predicted depth within monocular SLAM. This exploration aims to enhance the accuracy of map estimation in metric scale. Most existing approaches struggle with the non-Gaussian distribution inherent in heavy-tail noise produced by depth prediction models. The proposed monocular SLAM approach utilizes t-distribution for ego-motion, with parameter estimation achieved through maximum likelihood (ML) estimation using the expectation maximization (EM) algorithm. Experiments on real data show that the proposed t-distribution renders the monocular SLAM algorithm inherently robust to outliers and heavy-tail noise produced by depth prediction models.
dc.identifier.citationBrahmanage, G. S. (2024). RGB predicted depth simultaneous localization and mapping (SLAM) for outdoor environment (Doctoral thesis, University of Calgary, Calgary, Canada). Retrieved from https://prism.ucalgary.ca.
dc.identifier.urihttps://hdl.handle.net/1880/118463
dc.language.isoen
dc.publisher.facultyGraduate Studies
dc.publisher.institutionUniversity of Calgary
dc.rightsUniversity of Calgary graduate students retain copyright ownership and moral rights for their thesis. You may use this material in any way that is permitted by the Copyright Act or through licensing that has been assigned to the document. For uses that are not allowable under copyright legislation or licensing, you are required to seek permission.
dc.subjectSLAM
dc.subject3D Perception
dc.subjectDeep Learning
dc.subjectMonocular
dc.subjectDepth Prediction
dc.subject.classificationRobotics
dc.subject.classificationArtificial Intelligence
dc.titleRGB Predicted Depth Simultaneous Localization and Mapping (SLAM) for Outdoor Environment
dc.typedoctoral thesis
thesis.degree.disciplineEngineering – Electrical & Computer
thesis.degree.grantorUniversity of Calgary
thesis.degree.nameDoctor of Philosophy (PhD)
ucalgary.thesis.accesssetbystudentI do not require a thesis withhold – my thesis will have open access and can be viewed and downloaded publicly as soon as possible.
Files
Original bundle
Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
ucalgary_2024_brahmanage_gayan.pdf
Size:
22.64 MB
Format:
Adobe Portable Document Format
Description:
License bundle
Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
license.txt
Size:
2.62 KB
Format:
Item-specific license agreed upon to submission
Description: