Linkage Dysequillibrium and Recombination
Terms
Linkage Equillibrium | Linkage Dysequillibrum | |
---|---|---|
Two locus are independent | Two locus are not independent | |
Extensive recombination | Some or no recombination | |
|D| = 0 | |D|>0 |
Measure Linkage Dysequillibirum by D
Suppose there are two loci, A and B. Both have 2 alleles. Denoted as 0 or 1.
If there’s intensive recombination happening between A and B. Getting 1 in A does not imply anything about the genotype is locus B.
Therefore we expect $P_{00} = P_{0\star} \times P_{\star0}$.
$D$ measures the discrepency between real data $P_{00}$ and what we expect under independence $P(0\star) \times P(\star0)$
$D = P_{00} - P(0\star) \times P(\star0)$
When Linkage Equillibrium (intensive recombine leads to independence between two loci), D = 0.
There are many equal ways of expressing D:
$P_{00} = P_{0\star} - P_{01}$
$P_{\star0} = 1-P_{\star1}$
plug the above two into $D = P_{00} - P(0\star) \times P(\star0)$
$D = (P_{0\star} - P_{01}) - P_{0\star}(1-P_{\star1}) = - (P_{01} - P_{0\star} \times P_{\star1})$
Using similar tricks we can get
$D = P_{11} - P(1\star) \times P(\star1)$
D prime
$D^{\prime} = \frac{ | D | }{D_{max}}$ |
is just to make the measure always between 0 and 1
Linkage Dysequillibrium (LD) Decay: How does D change over time?
Denote $D^{(t)}$ as $D$ at time $t$. $P_{00}^{(t)}$ as probability of observing 0 in both loci at time $t$.
At time $t = 0$, mutations are created.
Recombination rate $c$ means number of crossover per b.p. per generation.
When two loci are $L$ b.p. apart, we can expect $Lc$ crossovers.
Let’s establish the relationship between $D^{(t)}$ and $D^{(t-1)}$:
The original D formula: $D^{(t)} = P_{00}^{(t)} - P_{0\star}^{(t)} P_{\star0}^{(t)}$
-
As Hardy Weinberg Eqilliburm specifies, allele frequency don’t change over time. Thus $P_{0\star}^{(t)} = P_{0\star}^{(t-1)}$. So does $P_{\star0}^{(t)} = P_{\star0}^{(t-1)}$.
-
Two possibility of resulting in 00 at time $t$:
- No crossover happens between the two loci, the probability is $1-Lc$. In this case, 00 are preserved.
- Crossover occur with probability $Lc$. In this case 00 comes from recombination of 0* and *0.
- This leads us to $P_{00}^{(t)} = (1-Lc)P_{00}^{(t-1)} + (Lc)(P_{0\star}^{(t-1)} P_{\star0}^{(t-1)})$
Pluggin both back to the original formula:
$D^{(t)} = P_{00}^{(t)} - P_{0\star}^{(t)} P_{\star0}^{(t)}$
$= (1-Lc)P_{00}^{(t-1)} + (Lc)(P_{0\star}^{(t-1)} P_{\star0}^{(t-1)}) - P_{0\star}^{(t-1)} P_{\star0}^{(t-1)}$
$=(1-Lc)D^{(t-1)}$
More recombination ($Lc$), $D$ drops more per generation and with distance.
Compare to time $t=0$:
$D^{(t)} = (1-Lc)^{t}D^{0}$
When $x$ is small, $(1-x)^{k} \approx \exp{-xk} \approx 1-xk$
Therefore, $D^{(t)} = \exp{-Lct}D^{0}$
D decays exponentially witg time $t$ and distance $L$.
r measure: Discrepency to independence
Assuming from linkae equillibirum (independence between loci):
In a population of size $N$
Expected genotype counts(E):
0 | 1 | |
---|---|---|
0 | $N P_{0\star} P_{\star 0}$ | $N P_{0\star} P_{\star 1}$ |
1 | $N P_{1\star} P_{\star 0}$ | $N P_{1\star} P_{\star 1}$ |
Observed genotype counts(O):
0 | 1 | |
---|---|---|
0 | $N P_{00}$ | $N P_{01}$ |
1 | $N P_{10}$ | $N P_{11}$ |
Test statistics:
$r^{2} = \sum \frac{(O-E)^{2}}{E}$
Application
Given 2 SNP matrix, 1 from Africa, 1 from Europe, but we don’t know which belongs to which population. How can you distinguish which matrix belongs to which?
Plot $D$ with respect to $L$. At the same distance $L$, African population will have lower D (more independent) since they have been recombining for a longer amount of time.