Additive Trait Parameterization¶

Modules we will need for this example¶

import simuOpt
simuOpt.setOptions(alleleType='short', quiet=True)
import simuPOP as sim
import numpy as np, pandas as pd
import random
from saegus import analyze, operators, parameters
np.set_printoptions(suppress=True, precision=5)

Our goal is to show how to parameterize an additive trait in :mod`saegus` code. We will make use of simuPOP infoFields to store the information about the additive trait for each individual. Also we will add a unique identifier for each individual which will help us verify that the trait is being calculated correctly. In this example we will randomly choose quantitative trait loci from among loci which are segregating in example_pop.

Preparing the Population¶

We will re-load the population and add information fields. example_pop needs the information fields ind_id, g and p.

Load the Population¶

We will use the population we created in the last step instead of creating a new population.

Loading our example population from a file¶

example_pop = sim.loadPopulation('example_pop.pop')

Add Information Fields¶

At present our population has no special information assigned to its members. Each individual is only a genotype. By default simuPOP uses ind_id, father_id, mother_id, father_idx and mother_idx for its very useful _Tagger functions. We can save some hassle by using these for identifying individuals.

Add the infoFields ind_id, g and p¶

example_pop.infoFields()
# ()
example_pop.addInfoFields(['ind_id', 'g', 'p'])
example_pop.infoFields()
# ('ind_id', 'g', 'p')

By default information fields are set to 0.0. We can initialize the ind_id field using a simuPOP function.

Initialize individual identifiers¶

print(np.array(example_pop.indInfo('ind_id')))
# [ 0.  0.  0.  ...  0.]
sim.tagID(example_pop)
print(np.array(example_pop.indInfo('ind_id')))
# [ 1.  2.  3.  ...  105.]

Note

:: In this step we converted the output into a np.array for aesthetics

Calculate Allele Frequencies¶

Using simuPOP to compute allele frequencies¶

sim.stat(example_pop, alleleFreq=sim.ALL_AVAIL)

Determine Segregating Loci¶

For simplicity we will use loci which have more than one allele i.e. segregating.

Using simuPOP to find segregating loci¶

sim.stat(example_pop, numOfSegSites=sim.ALL_AVAIL,
                      vars=['numOfSegSites', 'segSites', 'fixedSites'])
example_pop.dvars().numOfSegSites
# 42837
print(example_pop.dvars().segSites[::1000]) # every 1000th segregating locus
# [0, 1040, 2072, ..., 43578]

There are 42,837 segregating loci in this population. saegus has a function to put the alleles into an array and assign the alleles at qtl an effect as a draw from a specified distribution.

Additive Trait¶

We have all the information we need from the previous steps. We will randomly choose 5 QTL from the segregating loci. Both alleles at each QTL are assigned an effect as a random draw with an exponential distribution.

Choosing QTL and Assign Effects¶

For this example we will pick 5 loci to designate as quantitative trait loci.

Choosing QTL and assigning allele effects¶

segregating_loci = example_pop.dvars().segSites
qtl = sorted(random.sample(segregating_loci, 5))
print(qtl)
# [6, 2972, 12694, 30642, 34123]

Every allele is initially assigned an effect of 0. Now alleles only at each QTL will be assigned a non-zero effect drawn from the Exponential distribution.

Assign allele effects using an exponential distribution¶

example_run = analyze.Study('example_pop')
allele_states = example_run.gather_allele_data(example_pop)
alleles = np.array([allele_states[:, 1], allele_states[:, 2]]).T
trait = parameters.Trait()
ae_table = trait.construct_allele_effects_table(alleles, qtl, random.expovariate, 1)
print(ae_table[qtl]) # qtl only
# [[    6.          1.          0.47333     3.          1.1387 ]
#  [ 2972.          1.          0.50155     2.          0.81906]
#  [12694.          1.          0.41925     3.          1.32648]
#  [30642.          1.          0.70116     3.          0.16591]
#  [34123.          1.          3.27972     3.          0.33993]]
print(ae_table) # all loci
# [[     0.      1.      0.      2.      0.]
#  [     1.      2.      0.      3.      0.]
#  [     2.      2.      0.      3.      0.]
#  ...,
#  [ 44442.      1.      0.      2.      0.]
#  [ 44443.      1.      0.      3.      0.]
#  [ 44444.      1.      0.      3.      0.]]

Alternatively, we could use another distribution, such as the Normal. This overwrites the previously assigned effects.

Assign allele effects using a normal distribution¶

ae_table = trait.construct_allele_effects_table(alleles, qtl, random.normalvariate, 0, 1)
print(ae_table[qtl]) # qtl only
# [[    6.          1.          0.09821     3.          0.19477]
#  [ 2972.          1.         -1.48559     2.          1.47764]
#  [12694.          1.         -1.16001     3.          0.09613]
#  [30642.          1.          0.44827     3.          0.11772]
#  [34123.          1.          2.15811     3.          0.99274]]

For speed of computation we construct an array of allele effects where the row of the array corresponds to the locus and the column corresponds to the integer representing the allele state.

Putting the allele effects in an array for speed of computation¶

ae_array = trait.construct_ae_array(ae_table, qtl)
print(ae_array[qtl])
# [[ 0.       0.09821  0.       0.19477  0.       0.     ]
#  [ 0.      -1.48559  1.47764  0.       0.       0.     ]
#  [ 0.      -1.16001  0.       0.09613  0.       0.     ]
#  [ 0.       0.44827  0.       0.11772  0.       0.     ]
#  [ 0.       2.15811  0.       0.99274  0.       0.     ]]

Definition of `g`¶

g is the sum of the allele effects of an individual’s genotype. There is no noise or error in g because we have a priori determined the allele effects.

Calculating g values¶

operators.calculate_g(example_pop, ae_array)
print(np.array(example_pop.indInfo('g')))
# [ 3.7728   5.66723  0.90614  ...  6.83259]

Calculation of Error Term¶

To simulate the experimental noise a term \(\epsilon\) is added to each individual’s a value. \(\epsilon\) is a random variable with a normal distribution given by mean \(0\) and variance given by:

\[\sigma^2_\epsilon = \sigma^2_a(\frac{1}{h^2}-1);\]

where \(\sigma^2_a\) is the variance of a and \(h^2\) is the narrow sense heritability.

\[\varepsilon \sim \mathcal{N} (0,\sigma^2_\epsilon)\]

Hence, an individual’s value of p is calculated by

\[p = a + \epsilon\]

Calculating `p`¶

It is straightforward to calculate p for the population but we already have a function to make it even easier for ourselves.

Computing p for each individual¶

heritability = 0.7
operators.calculate_error_variance(example_pop, heritability)
operators.calculate_p(example_pop)
print(np.array(example_pop.indInfo('p')))
# [ 4.70345  8.28645  0.7787  ...  6.9911 ]

Validating the `h2` Function¶

Becuase \(\epsilon\) is a random variable, we will compute median h2 from 30 replications (median b/c h2 is bounded)

Validating the calculation of h2¶

check_h2 = []
for x in range(0, 30):
    ae_table = trait.construct_allele_effects_table(alleles, qtl, random.normalvariate, 0, 1)
    ae_array = trait.construct_ae_array(ae_table, qtl)
    operators.calculate_g(example_pop, ae_array)
    operators.calculate_error_variance(example_pop, heritability)
    operators.calculate_p(example_pop)
    check_h2.append(np.var(example_pop.indInfo('g')) / np.var(example_pop.indInfo('p')))

check_h2
# [0.8594594061513856, 0.547350607012552, 0.8588487371957536, ..., 0.4640558795605753]
np.median(check_h2_v2)
# 0.6260138213421671

Validating the `calculate_g` Function¶

Let’s make sure that our function is correctly matching allele to its effect and summing the effects correctly. We will look at the alleles individual 1 of example_pop at the QTL. Then we will sum the effects and compare the result with our function calculate_g().

Validating the calculation of g¶

example_ind = example_pop.individual(0)
alpha_qtl_alleles = np.array(example_ind.genotype(ploidy=0))[qtl]
omega_qtl_alleles = np.array(example_ind.genotype(ploidy=1))[qtl]
example_g = [[], []]
for locus, alpha, omega in zip(qtl, alpha_qtl_alleles, omega_qtl_alleles):
    print(locus, alpha, ae_array[locus, alpha], omega, ae_array[locus, omega])
    example_g[0].append(ae_array[locus, alpha])
    example_g[1].append(ae_array[locus, omega])

sum(example_g[0]) + sum(example_g[1])
# 0.8047750628903483
example_pop.indByID(1).g
# 0.8047750628903477

Additive Trait Parameterization¶

Preparing the Population¶

Load the Population¶

Add Information Fields¶

Calculate Allele Frequencies¶

Determine Segregating Loci¶

Additive Trait¶

Choosing QTL and Assign Effects¶

Definition of g¶

Calculation of Error Term¶

Calculating p¶

Validating the h2 Function¶

Validating the calculate_g Function¶

Definition of `g`¶

Calculating `p`¶

Validating the `h2` Function¶

Validating the `calculate_g` Function¶