RNN-Enhanced Deep Residual Neural Networks for Web Page Classification

Date
2016
Journal Title
Journal ISSN
Volume Title
Publisher
Abstract
As the number of Web pages increases sharply, Web page classification becomes more important in some fields like web mining and information retrieval. However, traditional textual classifiers usually rely on many hand-crafted features and do not produce satisfying results. We introduce a relatively deep residual neural network for Web page classification problem based on the simplified version of the target HTML document. Combining several advanced techniques of deep learning, the optimal model has 20 neural layers with parameters, and is end-to-end differentiable. We also present a top RNN classifier to utilize the class information from the relatives Web pages. Two large-scale datasets are constructed to show that our ResNet-20 and top RNN design could achieve best or promising results, compared to several baseline methods.
Description
Keywords
Artificial Intelligence, Computer Science
Citation
Lin, Y. (2016). RNN-Enhanced Deep Residual Neural Networks for Web Page Classification (Master's thesis, University of Calgary, Calgary, Canada). Retrieved from https://prism.ucalgary.ca. doi:10.11575/PRISM/27671