Evaluating the patterns of linkage disequilibrium (LD) is important for association mapping study as well as for studying
the genomic architecture of human genome (e.g., haplotype block structures). Commonly used bi-allelic pairwise measures for
assessing LD between two loci, such as
r
2 and
D′, may not make full and efficient use of modern multilocus data. Though extended to multilocus scenarios, their performance
is still questionable. Meanwhile, most existing measures for an entire multilocus region, such as normalized entropy difference,
do not consider existence of LD heterogeneity across the region under investigation. Additionally, these existing multilocus
measures cannot handle distant regions where long-range LD patterns may exist. In this study, we proposed a novel multilocus
LD measure developed based on mutual information theory. Our proposed measure described LD pattern between two chromosome
regions each of which may consist of multiple loci (including multi-allele loci). As such, the proposed measure can better
characterize LD patterns between two arbitrary regions. As potential applications, we developed algorithms on the proposed
measure for partitioning haplotype blocks and for selecting haplotype tagging SNPs (htSNPs), which were helpful for follow-up
association tests. The results on both simulated and empirical data showed that our LD measure had distinct advantages over
pairwise and other multilocus measures. First, our measure was more robust, and can capture comprehensively the LD information
between neighboring as well as disjointed regions. Second, haplotype blocks were better described via our proposed measure.
Furthermore, association tests with htSNPs from the proposed algorithm had improved power over tests on single markers and
on haplotypes.
Keywords Entropy - Mutual information - Linkage disequilibrium (LD) - Multilocus LD - Haplotype block - Haplotype tagging SNPs (htSNPs)