In a previous post, I explained that linkage disequilibrium (LD) occurs when alleles at different loci co-occur more or less often than expected by chance.

Now the question is how we measure this in a way that is useful for GWAS.

Let’s consider two SNPs:
SNP1: alleles A / a
SNP2: alleles B / b
We define:
pꭺ: frequency of allele A
pᏼ: frequency of allele B
pꭺᏼ(obs): observed frequency of the haplotype AB

If the two alleles combine randomly, the expected haplotype frequency is:
pꭺᏼ(exp)=pꭺ x pᏼ

👉 LD exists when the observed frequency deviates from the expected one. To quantify this deviation, we calculate the difference D:
D= pꭺᏼ(obs) – pꭺᏼ(exp)

But the value D is inherently dependent on allele frequencies. As a result, it is not standardized and cannot be reliably used to compare LD between different pairs of loci. Let’s check two numerical examples:
Case 1: Common alleles
pꭺ=0.5
pᏼ=0.5
pꭺᏼ=0.5 x 0.5= 0.25, pꭺᏼ(ᴏʙs)=0.35
D=0.35−0.25=0.10

Example 2: Rare alleles
pꭺ=0.1
pᏼ=0.1
pꭺᏼ=0.1 x 0.1= 0.01, pꭺᏼ(ᴏʙs)=0.11
D=0.11−0.01=0.10

In both examples, the value of D is exactly the same (D = 0.10). However, the biological interpretation is completely different. In the case of rare alleles, the same numerical value reflects a much stronger association — the AB haplotype occurs 10X more frequently than expected under random combination.

Because D depends on allele frequencies and is not directly comparable across loci, it is standardized by dividing it by its maximum possible value (Dmax), resulting in D′, which ranges between 0 and 1. However, even after this normalization, D′ still depends on allele frequencies and can be inflated for rare alleles.

To avoid this limitation, we use the r²,the 𝘀𝗾𝘂𝗮𝗿𝗲𝗱 𝗰𝗼𝗿𝗿𝗲𝗹𝗮𝘁𝗶𝗼𝗻 between SNPs. The formula is:
r²= D² / pꭺ(1-pꭺ)pᏼ(1-pᏼ)

If r² is close to zero, there is a little association between the two SNPs. if the r² is closer to one, the association is high and one SNP can predict the other one.

Let’s see how this impacts the two previous cases. In the case of common alleles, r² is about 0.16, indicating only moderate linkage disequilibrium. The SNPs are partially correlated. In a GWAS context, this means limited tagging efficiency of causal variants.

In the case of rare alleles, the same D value translates into r² close to 1, meaning near-perfect correlation. Here, knowing one SNP essentially tells you the other, resulting in excellent tagging, and high GWAS power even without genotyping the causal variant.

👉 SO, GWAS does not detect causal variants directly — it detects them through r².

👉 If you’d like to be informed about the upcoming workshops organized by AgroSynapsis, and receive early access and discounts, 𝗳𝗶𝗹𝗹 𝗼𝘂𝘁 𝗼𝘂𝗿 𝘀𝗵𝗼𝗿𝘁 𝘁𝗿𝗮𝗶𝗻𝗶𝗻𝗴 𝗶𝗻𝘁𝗲𝗿𝗲𝘀𝘁 𝗳𝗼𝗿𝗺 here:

https://lnkd.in/g3tApqPz

How do we measure Linkage Disequilibrium (LD) in practice?