Community Structure, Inference and Network-Based Markers

Gao, Shang

Community Structure, Inference and Network-Based Markers

Files

ucalgary_2014_Gao_Shang.pdf(3.67 MB)

Date

2014-07-10

Authors

Gao, Shang

Abstract

In the core of system biology, it is believed that molecules within the cell act collaboratively in an organized behavior. Researchers are studying the interactions and mainly concentrate on identifying malfunctioning molecules as potential disease biomarkers. Thus, a network has become an important means to represent biological systems, and network approaches have shown substantial promise due to the simplicity in data representation and associated rich analytical apparatus. Generally speaking, the workflow of a computational system biology study means: 1.) Investigating certain elements of biological networks and their interactions, which depends on the purpose of the study. 2.) Collecting experimental high-throughput and genome-wide data and integrating computational methods to analyze the data and validate findings. In this thesis, we frame the investigations by first asking a system biology question, and then provide computational means to answer the question. My thesis consists of three major interrelated components, as the title suggests, we first study the network structure by a novel strategy of bridging together social and biological networks based on our argument that there exist a strong analogy between humans and molecules. As social network analysis is gaining popularity in modeling real world problems, the task of applying the social network model concepts and notions to biological data is still one of the most attractive research problems to be addressed. We design computational means to find community structures and design efficient algorithms to dynamically analyze gene boundaries using geometric convexity. Our approach contributes to the new branch of applying social network mechanisms in biological data analysis, leading to new data mining strategies implied by witnessing social behaviors in gene expression analysis. Further into the topology study of biological networks, we investigate the relationship between the multi-scalability of community structures of metabolic networks and the distributional effect of network motifs, i.e., the inference problem. We observe several patterns through studying three organisms, including the effect of directionality of networks, homogeneity of motif-enriched communities, and motif type-specific distributions across scales. We also provide methods to quantify motif influence under the community context. Overall, our work suggests that the theoretic evolvability of modularity tightly correlates with motif distributional effect and vice versa. In this regard, we design computational tools to analyze community structure of very large networks of arbitrary types. The Multi-scale Community Finder (MCF) is the first tool in this area. Finally we arrive at the question of how to design efficient bio-markers for complex diseases, e.g., cancer. First, it is important to understand the complexity of cancer. We believe that to understand individualized gene behavior across patients, relational status of genes needs to be considered because complex disease phenotype is often caused by cascaded failures of genetic interactions in cancer cells. We implement a framework to quantify the molecular heterogeneity of tumors from gene-gene relational perspective using co-expression networks and interactome data. Next, we present a method to reverse engineer integrative gene networks. The main advantage of our method is the integration of different quantitative and qualitative data sets in order to reconstruct a multiplex network, without necessarily imposing data constraints, such as each genomic datum needs to have the same number of entities. Another advantage of our method is that from the integrated networks, predictions can be made by propagating beliefs from seed nodes representing known knowledge. Thus, we combine data integration and network-based prediction into a single framework. We demonstrate our method through case studies using breast cancer data. Our approaches present promising results and new ways of thinking and mining complex genomic datasets. Overall, this thesis presents a comprehensive study of biological networks and the novel application of computational means to implement the biomarker detection problem in the era of big genomic data. Finally it is important to highlight the fact that our study considers the challenges due to data heterogeneity and the diversity in the sources producing the data.

Keywords

Computer Science

Citation

Gao, S. (2014). Community Structure, Inference and Network-Based Markers (Doctoral thesis, University of Calgary, Calgary, Canada). Retrieved from https://prism.ucalgary.ca. doi:10.11575/PRISM/25365

URI

http://hdl.handle.net/11023/1617

Collections

Open Theses and Dissertations

Full item page