Scene parsing is to densely label the pixels in an image with the semantic categories. In this thesis, we present a scene parsing framework which can work on both images and point clouds. To this end, we develop two separate pipelines for images and point clouds. For point clouds, a coarse segmentation is implemented to obtain an initial distribution for the objects. For images, superpixel segmentation is implemented and StructureTransfer is carried out. StructureTransfer is a model to find similar regions across scenes. The two pipelines converge at the inference step. Several novel potentials, representing point cloud constraints and StructureTransfer scores, are introduced into a traditional Markov Random Field (MRF) for the inference. The parsing accuracy of the proposed method is close to state-of-the-art algorithms on images. With the point clouds, the accuracy is significantly enhanced. The proposed framework shows remarkable prospect in real-world applications.