Li, ZongpengLin, Yuhui2016-11-242016-11-2420162016http://hdl.handle.net/11023/3458As the number of Web pages increases sharply, Web page classification becomes more important in some fields like web mining and information retrieval. However, traditional textual classifiers usually rely on many hand-crafted features and do not produce satisfying results. We introduce a relatively deep residual neural network for Web page classification problem based on the simplified version of the target HTML document. Combining several advanced techniques of deep learning, the optimal model has 20 neural layers with parameters, and is end-to-end differentiable. We also present a top RNN classifier to utilize the class information from the relatives Web pages. Two large-scale datasets are constructed to show that our ResNet-20 and top RNN design could achieve best or promising results, compared to several baseline methods.engUniversity of Calgary graduate students retain copyright ownership and moral rights for their thesis. You may use this material in any way that is permitted by the Copyright Act or through licensing that has been assigned to the document. For uses that are not allowable under copyright legislation or licensing, you are required to seek permission.Artificial IntelligenceComputer ScienceResidual Neural NetworksWeb Page ClassificationRNN-Enhanced Deep Residual Neural Networks for Web Page Classificationmaster thesis10.11575/PRISM/27671