Cell In Silico


Molecules - Sequence - Disease

How to calculate isotope profile?

May 09, 2020 | 4 Minute Read

The Question

Given a chemical formula $CO_2$, return the distribution of mass (with different combinations of isotope C12, C13; O16, O18)

An Example

Say C12 makes up 99%; C13 make 1%; O16 makes 98%, O18 makes 2%.

List possible combination of atoms and mass

  • C12 + O16 + O16 will be most probable, mass = 44
  • C12 + O16 + O18, mass = 46
  • C12 + O18 + O18, mass = 48
  • C13 + O16 + O16, mass = 45
  • C13 + O16 + O18, mass = 47
  • C13 + O18 + O18, mass = 49

Calculate probability for all of them

  • probability getting C12 is $0.99$; C13 is $0.01$.
  • probability getting 2 O16 is $0.98 \times 0.98$; one of each is $2\times0.98\times0.02$; both O18 is $0.02\times0.02$
  • Therefore getting mass 44 (C12 + O16 + O16), the probability is $0.99\times0.98\times0.98$

All other probability can be calculated using the same method.

Introducing characteristic polynomial

What is you have a huge molecule $C_{52}O_{76}H_{98}N_{29}S_{92837}$ (It does not exist.)

Calculating the probability with the previous method will just drive you crazy!

To solve this problem, mathematicians invent characteristic polynomial to encode the “probability distribution of mass” in a polynomial for each atom. By multiplying the polynomial of all atoms in a molecule, the resulting gigantic polynomial takes care of the combination of different atomic mass for us! What we need to do then is to expand the big polynomial and extract the coefficient. The coefficients happens to be the probability of mass.

Sounds like magic!

Let us begin with the same example

How to make the Characteristic polynomial with CO2?

  1. Write down relative mass for each kind of atom:

Assign C12, the smallest mass carbon to relative mass $0$; C13 is $1$.

Similarly, O16 becomes has relative mass $0$; O18 has relative mass $2$.

The reason to do relative mass is because we want to know how much more each isotope contribute to the molecular mass. We only care whether is has a $2 Da$ increase.

  1. Write down characteristic polynomial for each atom base on relative mass

Formula here: $\sum_{all isotope} probability * x^{relative mass}$

  • C: $0.99 x^0 + 0.01 x^1$ encoding (0.99 C12, 0.01 C13)
  • O: $0.98 x^0 + 0.02 x^2$ encoding (0.98 O16, 0.02 O18)
  1. Multiply all atoms’s polynomial together:

Molecule’s polynomial = C’s * O’s * O’s

= $(0.99 x^0 + 0.01 x^1) (0.98 x^0 + 0.02 x^2)^2$

  1. Expand the molecule’s polynomial (do it by hand only here):

$(0.99 x^0 + 0.01 x^1) (0.98 x^0 + 0.02 x^2)^2 = 0.000004x^5+0.000396x^4+0.000392x^3+0.038808x^2+0.009604x+0.950796x^0$

Each coefficient corresponds to the probability of molecule. Look at the term $0.950796x^0$ Notice 0.95 probability of “C12 + O16 + O16”, mass = 44 (sum of relative mass 0). term $x^{k}$ carries the probability for molecule with relative mass $k$ in its coefficient.

Why it works?

When expanding $(ax + b)(cx + d)$, the term $x$ has coefficient from $ax, d$ and $b, cx$. When we multiply $ax$ with $dx^0$, the relative mass is summed in x’s power ($x^0 * x^1 = x^{0+1}$); the combination is recorded in the coefficient $ad + bc$ (either one of the atom contribute relative mass 1, and the other contribute relative mass 0).

The way we expand polynomial helps us to do complex combinations. As long as you can record the probability of each atom in its polynomial, you can combine as many atoms as you wish, and expand to get the probability distribution.

I don’t do mass spect and I don’t care about isotopes. Anything generalizable here?

The above problem is just an instance of probability generating function (pgf). Let $Y$ be an integer valued random variable, it’s pgf is defined as $pgf_Y(t)\sum_{y} P_Y(y) t^y$

Sounds fancy right? In fact, in the previous example, let’s make the relative mass of carbon a random variable C. What we do to write down the characteristic polynomial is just, pgf! (we just switch $t$ to $x$, that’s all)

Now you know why we want to write it in a form of polynomials.

Probability generating function (pgf) have other beautiful(?) math property

  • $pgf_Y(1) = 1$ sum of probability is 1
  • $pgf’_Y(1) = E(Y)$ first derivative of pgf when $t=1$ is the expect values.
  • It also have some relationship with $Var(Y)$
  • You can use this to calculate the mean and variance of exponential, poission, binomial…….distribution.
  • the most powerful property is to calculate two independent random variable $X$ and $Y$, and get the probability distribution for $X+Y$: $pgf_{X+Y}(t) = pgf_{X}(t)*pgf_{Y}(t)$. That’s what we do with carbon and oxygen atoms, right?

The continuous version of pgf: Moment generating function, can deal with continous random variable.

Reference

  1. probability generating function