In the core of system biology, it is believed that molecules within the cell act collaboratively
in an organized behavior. Researchers are studying the interactions and mainly concentrate
on identifying malfunctioning molecules as potential disease biomarkers. Thus, a network has
become an important means to represent biological systems, and network approaches have
shown substantial promise due to the simplicity in data representation and associated rich
analytical apparatus. Generally speaking, the workflow of a computational system biology
study means: 1.) Investigating certain elements of biological networks and their interactions,
which depends on the purpose of the study. 2.) Collecting experimental high-throughput and
genome-wide data and integrating computational methods to analyze the data and validate
findings. In this thesis, we frame the investigations by first asking a system biology question,
and then provide computational means to answer the question.
My thesis consists of three major interrelated components, as the title suggests, we first
study the network structure by a novel strategy of bridging together social and biological
networks based on our argument that there exist a strong analogy between humans and
molecules. As social network analysis is gaining popularity in modeling real world problems,
the task of applying the social network model concepts and notions to biological data is
still one of the most attractive research problems to be addressed. We design computational
means to find community structures and design efficient algorithms to dynamically analyze
gene boundaries using geometric convexity. Our approach contributes to the new branch of
applying social network mechanisms in biological data analysis, leading to new data mining
strategies implied by witnessing social behaviors in gene expression analysis.
Further into the topology study of biological networks, we investigate the relationship
between the multi-scalability of community structures of metabolic networks and the distributional effect of network motifs, i.e., the inference problem. We observe several patterns
through studying three organisms, including the effect of directionality of networks, homogeneity of motif-enriched communities, and motif type-specific distributions across scales.
We also provide methods to quantify motif influence under the community context. Overall,
our work suggests that the theoretic evolvability of modularity tightly correlates with motif
distributional effect and vice versa. In this regard, we design computational tools to analyze
community structure of very large networks of arbitrary types. The Multi-scale Community
Finder (MCF) is the first tool in this area.
Finally we arrive at the question of how to design efficient bio-markers for complex
diseases, e.g., cancer. First, it is important to understand the complexity of cancer. We
believe that to understand individualized gene behavior across patients, relational status of
genes needs to be considered because complex disease phenotype is often caused by cascaded
failures of genetic interactions in cancer cells. We implement a framework to quantify the
molecular heterogeneity of tumors from gene-gene relational perspective using co-expression
networks and interactome data. Next, we present a method to reverse engineer integrative
gene networks. The main advantage of our method is the integration of different quantitative
and qualitative data sets in order to reconstruct a multiplex network, without necessarily
imposing data constraints, such as each genomic datum needs to have the same number of
entities. Another advantage of our method is that from the integrated networks, predictions
can be made by propagating beliefs from seed nodes representing known knowledge. Thus,
we combine data integration and network-based prediction into a single framework. We
demonstrate our method through case studies using breast cancer data. Our approaches
present promising results and new ways of thinking and mining complex genomic datasets.
Overall, this thesis presents a comprehensive study of biological networks and the novel
application of computational means to implement the biomarker detection problem in the
era of big genomic data. Finally it is important to highlight the fact that our study considers
the challenges due to data heterogeneity and the diversity in the sources producing the data.