Cell In Silico


Molecules - Sequence - Disease

Linkage Dysequillibrium and Recombination

March 13, 2021 | 3 Minute Read

Terms

  Linkage Equillibrium Linkage Dysequillibrum
  Two locus are independent Two locus are not independent
  Extensive recombination Some or no recombination
  |D| = 0 |D|>0

Measure Linkage Dysequillibirum by D

Suppose there are two loci, A and B. Both have 2 alleles. Denoted as 0 or 1.

If there’s intensive recombination happening between A and B. Getting 1 in A does not imply anything about the genotype is locus B.

Therefore we expect $P_{00} = P_{0\star} \times P_{\star0}$.

$D$ measures the discrepency between real data $P_{00}$ and what we expect under independence $P(0\star) \times P(\star0)$

$D = P_{00} - P(0\star) \times P(\star0)$

When Linkage Equillibrium (intensive recombine leads to independence between two loci), D = 0.

There are many equal ways of expressing D:

$P_{00} = P_{0\star} - P_{01}$

$P_{\star0} = 1-P_{\star1}$

plug the above two into $D = P_{00} - P(0\star) \times P(\star0)$

$D = (P_{0\star} - P_{01}) - P_{0\star}(1-P_{\star1}) = - (P_{01} - P_{0\star} \times P_{\star1})$

Using similar tricks we can get

$D = P_{11} - P(1\star) \times P(\star1)$

D prime

$D^{\prime} = \frac{ D }{D_{max}}$

is just to make the measure always between 0 and 1

Linkage Dysequillibrium (LD) Decay: How does D change over time?

Denote $D^{(t)}$ as $D$ at time $t$. $P_{00}^{(t)}$ as probability of observing 0 in both loci at time $t$.

At time $t = 0$, mutations are created.

Recombination rate $c$ means number of crossover per b.p. per generation.

When two loci are $L$ b.p. apart, we can expect $Lc$ crossovers.

Let’s establish the relationship between $D^{(t)}$ and $D^{(t-1)}$:

The original D formula: $D^{(t)} = P_{00}^{(t)} - P_{0\star}^{(t)} P_{\star0}^{(t)}$

  1. As Hardy Weinberg Eqilliburm specifies, allele frequency don’t change over time. Thus $P_{0\star}^{(t)} = P_{0\star}^{(t-1)}$. So does $P_{\star0}^{(t)} = P_{\star0}^{(t-1)}$.

  2. Two possibility of resulting in 00 at time $t$:

    • No crossover happens between the two loci, the probability is $1-Lc$. In this case, 00 are preserved.
    • Crossover occur with probability $Lc$. In this case 00 comes from recombination of 0* and *0.
    • This leads us to $P_{00}^{(t)} = (1-Lc)P_{00}^{(t-1)} + (Lc)(P_{0\star}^{(t-1)} P_{\star0}^{(t-1)})$

Pluggin both back to the original formula:

$D^{(t)} = P_{00}^{(t)} - P_{0\star}^{(t)} P_{\star0}^{(t)}$

$= (1-Lc)P_{00}^{(t-1)} + (Lc)(P_{0\star}^{(t-1)} P_{\star0}^{(t-1)}) - P_{0\star}^{(t-1)} P_{\star0}^{(t-1)}$

$=(1-Lc)D^{(t-1)}$

More recombination ($Lc$), $D$ drops more per generation and with distance.

Compare to time $t=0$:

$D^{(t)} = (1-Lc)^{t}D^{0}$

When $x$ is small, $(1-x)^{k} \approx \exp{-xk} \approx 1-xk$

Therefore, $D^{(t)} = \exp{-Lct}D^{0}$

D decays exponentially witg time $t$ and distance $L$.

Image of Yaktocat

r measure: Discrepency to independence

Assuming from linkae equillibirum (independence between loci):

In a population of size $N$

Expected genotype counts(E):

  0 1
0 $N P_{0\star} P_{\star 0}$ $N P_{0\star} P_{\star 1}$
1 $N P_{1\star} P_{\star 0}$ $N P_{1\star} P_{\star 1}$

Observed genotype counts(O):

  0 1
0 $N P_{00}$ $N P_{01}$
1 $N P_{10}$ $N P_{11}$

Test statistics:

$r^{2} = \sum \frac{(O-E)^{2}}{E}$

Application

Given 2 SNP matrix, 1 from Africa, 1 from Europe, but we don’t know which belongs to which population. How can you distinguish which matrix belongs to which?

Plot $D$ with respect to $L$. At the same distance $L$, African population will have lower D (more independent) since they have been recombining for a longer amount of time.