Cancer biomarker extraction from gene expression microarray data

dc.contributor.advisorAlhajj, Reda
dc.contributor.authorAlshalalfa, Mohammed
dc.date.accessioned2017-12-18T21:44:48Z
dc.date.available2017-12-18T21:44:48Z
dc.date.issued2008
dc.descriptionBibliography: p. 115-124en
dc.descriptionsome pages are in colouren
dc.description.abstractBioinformatics is a new field of science mainly integrating computer science, mathematics, statistics and biology where the aim is to discover knowledge hidden within biological data. One of the widely investigated biological data is gene expression microarray data. Profiling the global gene expression patterns in different tissues/ sample can be investigated in few days due to microarray technology, which can accommodate the whole genome, unlike traditional methods which may take months. However, analyzing micro array data is challenging as the number of features (genes) is very large relative to the number of attributes (samples). Fortunately, microarray has been successfully used to study gene expression data; this allowed researchers to investigate different diseases, including cancer. In other words, using microarray in cancer diagnosis showed to be very efficient and reliable, but the large number of genes makes the data noisy and difficult to deal with. Consequently, identifying relevant genes has received considerable attention. In this thesis, we combine biological knowledge with machine learning techniques to propose three methods for extracting the most informative genes for cancer classification. The first method is based on double clustering; we filter the data initially with a statistical test and then cluster the data iteratively to get the best number of clusters. The genes closest to the centroids of the resulting clusters showed to have high potential to be significant features for sample classification. These genes (one per centroid) are used as input for building a classification model. The second method is based on iterative t-test in a way that eliminates noise from the data. The third method is a hybrid approach which combines statistical tests with entropy based tests. This method uses the t-test and Singular Value Decomposition (SVD) based entropy. It showed to be effective as it considers the feature itself and its effect on the data entropy. This approach is the first to combine entropy and statistical significance for gene ranking. We have also developed SVD based gene extraction method for multi-class data; only introduced at high level in this thesis, details are left are future work. The test results reported demonstrate the applicability and effectiveness of the three proposed approaches. _x000D_ Index Terms: Classification, clustering, t-test, singular value decomposition, support vector machine, microarray data, gene expression data, over-expression, underexpress10n._x000D_
dc.format.extentxiii, 127 leaves : ill. ; 30 cm.en
dc.identifier.citationAlshalalfa, M. (2008). Cancer biomarker extraction from gene expression microarray data (Master's thesis, University of Calgary, Calgary, Canada). Retrieved from https://prism.ucalgary.ca. doi:10.11575/PRISM/2304en_US
dc.identifier.doihttp://dx.doi.org/10.11575/PRISM/2304
dc.identifier.urihttp://hdl.handle.net/1880/103305
dc.language.isoeng
dc.publisher.institutionUniversity of Calgaryen
dc.publisher.placeCalgaryen
dc.rightsUniversity of Calgary graduate students retain copyright ownership and moral rights for their thesis. You may use this material in any way that is permitted by the Copyright Act or through licensing that has been assigned to the document. For uses that are not allowable under copyright legislation or licensing, you are required to seek permission.
dc.titleCancer biomarker extraction from gene expression microarray data
dc.typemaster thesis
thesis.degree.disciplineComputer Science
thesis.degree.grantorUniversity of Calgary
thesis.degree.nameMaster of Science (MSc)
ucalgary.item.requestcopytrue
ucalgary.thesis.accessionTheses Collection 58.002:Box 1768 520708931
ucalgary.thesis.notesUARCen
ucalgary.thesis.uarcreleaseyen
Files
Original bundle
Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
thesis_Alshalalfa_2008.pdf
Size:
54.67 MB
Format:
Adobe Portable Document Format
Description:
Collections