cLD: a novel statistic capturing linkage disequilibrium between rare variants
Date
Authors
Journal Title
Journal ISSN
Volume Title
Publisher
Abstract
In genetic and genomics, Linkage disequilibrium (LD) is a fundamental concept playing critical roles in association mapping and molecular evolution. However, LD is only appropriate for assessing the association between common genetic variants, leaving rare variants that account for 90% of human variants unattended. This is because the low allele frequency of variants lead to very high uncertainty. Therefore the LDs are not stable at all. To bridge this gap, we proposed a novel statistic, cumulative Linkage Disequilibrium (cLD) based on rare variants to capture the associations between genes.To verify that the new statistic could reveal the interactions between genes, we calculated cLD basing on the 1000 Genomes Project dataset and designed a rigorous statistical test to analyze cLD’s properties. Notably, by calculating LD between common variants using the same data, other scholars reported negative results stating that the genetic LD map is not overlapping with 3D contact map (assessed by the Hi-C experiments)[20]. However, by reanalyzing the same data using cLD, we revealed positive results that the 3D chromatin interactions did leave genetic footprints. Moreover, the cLD of interacting gene pairs (as reported by the various database of interactions) is significantly higher than the cLD of non-interacting gene pairs. To quantify why cLD is stable despite the allele frequency being extremely low, we further investigated the theoretical properties of cLD by deriving the closed-form variance of cLD and LD. We proved that var(cLD) has lower magnitudes than var(LD) when the genetic variants are rare.