Additive Trait Parameterization

Modules we will need for this example
import simuOpt
simuOpt.setOptions(alleleType='short', quiet=True)
import simuPOP as sim
import numpy as np, pandas as pd
import random
from saegus import analyze, operators, parameters
np.set_printoptions(suppress=True, precision=5)

Our goal is to show how to parameterize an additive trait in :mod`saegus` code. We will make use of simuPOP infoFields to store the information about the additive trait for each individual. Also we will add a unique identifier for each individual which will help us verify that the trait is being calculated correctly. In this example we will randomly choose quantitative trait loci from among loci which are segregating in example_pop.

Preparing the Population

We will re-load the population and add information fields. example_pop needs the information fields ind_id, g and p.

Load the Population

We will use the population we created in the last step instead of creating a new population.

Loading our example population from a file
example_pop = sim.loadPopulation('example_pop.pop')

Add Information Fields

At present our population has no special information assigned to its members. Each individual is only a genotype. By default simuPOP uses ind_id, father_id, mother_id, father_idx and mother_idx for its very useful _Tagger functions. We can save some hassle by using these for identifying individuals.

Add the infoFields ind_id, g and p
example_pop.infoFields()
# ()
example_pop.addInfoFields(['ind_id', 'g', 'p'])
example_pop.infoFields()
# ('ind_id', 'g', 'p')

By default information fields are set to 0.0. We can initialize the ind_id field using a simuPOP function.

Initialize individual identifiers
print(np.array(example_pop.indInfo('ind_id')))
# [ 0.  0.  0.  ...  0.]
sim.tagID(example_pop)
print(np.array(example_pop.indInfo('ind_id')))
# [ 1.  2.  3.  ...  105.]

Note

:: In this step we converted the output into a np.array for aesthetics

Calculate Allele Frequencies

Using simuPOP to compute allele frequencies
sim.stat(example_pop, alleleFreq=sim.ALL_AVAIL)

Determine Segregating Loci

For simplicity we will use loci which have more than one allele i.e. segregating.

Using simuPOP to find segregating loci
sim.stat(example_pop, numOfSegSites=sim.ALL_AVAIL,
                      vars=['numOfSegSites', 'segSites', 'fixedSites'])
example_pop.dvars().numOfSegSites
# 42837
print(example_pop.dvars().segSites[::1000]) # every 1000th segregating locus
# [0, 1040, 2072, ..., 43578]

There are 42,837 segregating loci in this population. saegus has a function to put the alleles into an array and assign the alleles at qtl an effect as a draw from a specified distribution.

Additive Trait

We have all the information we need from the previous steps. We will randomly choose 5 QTL from the segregating loci. Both alleles at each QTL are assigned an effect as a random draw with an exponential distribution.

Choosing QTL and Assign Effects

For this example we will pick 5 loci to designate as quantitative trait loci.

Choosing QTL and assigning allele effects
segregating_loci = example_pop.dvars().segSites
qtl = sorted(random.sample(segregating_loci, 5))
print(qtl)
# [6, 2972, 12694, 30642, 34123]

Every allele is initially assigned an effect of 0. Now alleles only at each QTL will be assigned a non-zero effect drawn from the Exponential distribution.

Assign allele effects using an exponential distribution
example_run = analyze.Study('example_pop')
allele_states = example_run.gather_allele_data(example_pop)
alleles = np.array([allele_states[:, 1], allele_states[:, 2]]).T
trait = parameters.Trait()
ae_table = trait.construct_allele_effects_table(alleles, qtl, random.expovariate, 1)
print(ae_table[qtl]) # qtl only
# [[    6.          1.          0.47333     3.          1.1387 ]
#  [ 2972.          1.          0.50155     2.          0.81906]
#  [12694.          1.          0.41925     3.          1.32648]
#  [30642.          1.          0.70116     3.          0.16591]
#  [34123.          1.          3.27972     3.          0.33993]]
print(ae_table) # all loci
# [[     0.      1.      0.      2.      0.]
#  [     1.      2.      0.      3.      0.]
#  [     2.      2.      0.      3.      0.]
#  ...,
#  [ 44442.      1.      0.      2.      0.]
#  [ 44443.      1.      0.      3.      0.]
#  [ 44444.      1.      0.      3.      0.]]

Alternatively, we could use another distribution, such as the Normal. This overwrites the previously assigned effects.

Assign allele effects using a normal distribution
ae_table = trait.construct_allele_effects_table(alleles, qtl, random.normalvariate, 0, 1)
print(ae_table[qtl]) # qtl only
# [[    6.          1.          0.09821     3.          0.19477]
#  [ 2972.          1.         -1.48559     2.          1.47764]
#  [12694.          1.         -1.16001     3.          0.09613]
#  [30642.          1.          0.44827     3.          0.11772]
#  [34123.          1.          2.15811     3.          0.99274]]

For speed of computation we construct an array of allele effects where the row of the array corresponds to the locus and the column corresponds to the integer representing the allele state.

Putting the allele effects in an array for speed of computation
ae_array = trait.construct_ae_array(ae_table, qtl)
print(ae_array[qtl])
# [[ 0.       0.09821  0.       0.19477  0.       0.     ]
#  [ 0.      -1.48559  1.47764  0.       0.       0.     ]
#  [ 0.      -1.16001  0.       0.09613  0.       0.     ]
#  [ 0.       0.44827  0.       0.11772  0.       0.     ]
#  [ 0.       2.15811  0.       0.99274  0.       0.     ]]

Definition of g

g is the sum of the allele effects of an individual’s genotype. There is no noise or error in g because we have a priori determined the allele effects.

Calculating g values
operators.calculate_g(example_pop, ae_array)
print(np.array(example_pop.indInfo('g')))
# [ 3.7728   5.66723  0.90614  ...  6.83259]

Calculation of Error Term

To simulate the experimental noise a term \(\epsilon\) is added to each individual’s a value. \(\epsilon\) is a random variable with a normal distribution given by mean \(0\) and variance given by:

\[\sigma^2_\epsilon = \sigma^2_a(\frac{1}{h^2}-1);\]

where \(\sigma^2_a\) is the variance of a and \(h^2\) is the narrow sense heritability.

\[\varepsilon \sim \mathcal{N} (0,\sigma^2_\epsilon)\]

Hence, an individual’s value of p is calculated by

\[p = a + \epsilon\]

Calculating p

It is straightforward to calculate p for the population but we already have a function to make it even easier for ourselves.

Computing p for each individual
heritability = 0.7
operators.calculate_error_variance(example_pop, heritability)
operators.calculate_p(example_pop)
print(np.array(example_pop.indInfo('p')))
# [ 4.70345  8.28645  0.7787  ...  6.9911 ]

Validating the h2 Function

Becuase \(\epsilon\) is a random variable, we will compute median h2 from 30 replications (median b/c h2 is bounded)

Validating the calculation of h2
check_h2 = []
for x in range(0, 30):
    ae_table = trait.construct_allele_effects_table(alleles, qtl, random.normalvariate, 0, 1)
    ae_array = trait.construct_ae_array(ae_table, qtl)
    operators.calculate_g(example_pop, ae_array)
    operators.calculate_error_variance(example_pop, heritability)
    operators.calculate_p(example_pop)
    check_h2.append(np.var(example_pop.indInfo('g')) / np.var(example_pop.indInfo('p')))

check_h2
# [0.8594594061513856, 0.547350607012552, 0.8588487371957536, ..., 0.4640558795605753]
np.median(check_h2_v2)
# 0.6260138213421671

Validating the calculate_g Function

Let’s make sure that our function is correctly matching allele to its effect and summing the effects correctly. We will look at the alleles individual 1 of example_pop at the QTL. Then we will sum the effects and compare the result with our function calculate_g().

Validating the calculation of g
example_ind = example_pop.individual(0)
alpha_qtl_alleles = np.array(example_ind.genotype(ploidy=0))[qtl]
omega_qtl_alleles = np.array(example_ind.genotype(ploidy=1))[qtl]
example_g = [[], []]
for locus, alpha, omega in zip(qtl, alpha_qtl_alleles, omega_qtl_alleles):
    print(locus, alpha, ae_array[locus, alpha], omega, ae_array[locus, omega])
    example_g[0].append(ae_array[locus, alpha])
    example_g[1].append(ae_array[locus, omega])

sum(example_g[0]) + sum(example_g[1])
# 0.8047750628903483
example_pop.indByID(1).g
# 0.8047750628903477