Additive Trait Parameterization¶
import simuOpt
simuOpt.setOptions(alleleType='short', quiet=True)
import simuPOP as sim
import numpy as np, pandas as pd
import random
from saegus import analyze, operators, parameters
np.set_printoptions(suppress=True, precision=5)
Our goal is to show how to parameterize an additive trait in :mod`saegus` code.
We will make use of simuPOP infoFields to store the information about
the additive trait for each individual. Also we will add a unique identifier
for each individual which will help us verify that the trait is being calculated
correctly. In this example we will randomly choose quantitative trait loci from
among loci which are segregating in example_pop.
Preparing the Population¶
We will re-load the population and add information fields.
example_pop needs the information fields ind_id, g and p.
Load the Population¶
We will use the population we created in the last step instead of creating a new population.
example_pop = sim.loadPopulation('example_pop.pop')
Add Information Fields¶
At present our population has no special information assigned to its members.
Each individual is only a genotype. By default simuPOP uses ind_id,
father_id, mother_id, father_idx and mother_idx for its very
useful _Tagger functions. We can save some hassle by using these for
identifying individuals.
infoFields ind_id, g and p¶example_pop.infoFields()
# ()
example_pop.addInfoFields(['ind_id', 'g', 'p'])
example_pop.infoFields()
# ('ind_id', 'g', 'p')
By default information fields are set to 0.0. We can initialize the
ind_id field using a simuPOP function.
print(np.array(example_pop.indInfo('ind_id')))
# [ 0. 0. 0. ... 0.]
sim.tagID(example_pop)
print(np.array(example_pop.indInfo('ind_id')))
# [ 1. 2. 3. ... 105.]
Note
:: In this step we converted the output into a np.array for aesthetics
Calculate Allele Frequencies¶
simuPOP to compute allele frequencies¶sim.stat(example_pop, alleleFreq=sim.ALL_AVAIL)
Determine Segregating Loci¶
For simplicity we will use loci which have more than one allele i.e. segregating.
simuPOP to find segregating loci¶sim.stat(example_pop, numOfSegSites=sim.ALL_AVAIL,
vars=['numOfSegSites', 'segSites', 'fixedSites'])
example_pop.dvars().numOfSegSites
# 42837
print(example_pop.dvars().segSites[::1000]) # every 1000th segregating locus
# [0, 1040, 2072, ..., 43578]
There are 42,837 segregating loci in this population. saegus has a function
to put the alleles into an array and assign the alleles at qtl an effect as
a draw from a specified distribution.
Additive Trait¶
We have all the information we need from the previous steps. We will randomly
choose 5 QTL from the segregating loci. Both alleles at each QTL are
assigned an effect as a random draw with an exponential distribution.
Choosing QTL and Assign Effects¶
For this example we will pick 5 loci to designate as quantitative trait loci.
segregating_loci = example_pop.dvars().segSites
qtl = sorted(random.sample(segregating_loci, 5))
print(qtl)
# [6, 2972, 12694, 30642, 34123]
Every allele is initially assigned an effect of 0. Now alleles only at each QTL
will be assigned a non-zero effect drawn from the Exponential distribution.
example_run = analyze.Study('example_pop')
allele_states = example_run.gather_allele_data(example_pop)
alleles = np.array([allele_states[:, 1], allele_states[:, 2]]).T
trait = parameters.Trait()
ae_table = trait.construct_allele_effects_table(alleles, qtl, random.expovariate, 1)
print(ae_table[qtl]) # qtl only
# [[ 6. 1. 0.47333 3. 1.1387 ]
# [ 2972. 1. 0.50155 2. 0.81906]
# [12694. 1. 0.41925 3. 1.32648]
# [30642. 1. 0.70116 3. 0.16591]
# [34123. 1. 3.27972 3. 0.33993]]
print(ae_table) # all loci
# [[ 0. 1. 0. 2. 0.]
# [ 1. 2. 0. 3. 0.]
# [ 2. 2. 0. 3. 0.]
# ...,
# [ 44442. 1. 0. 2. 0.]
# [ 44443. 1. 0. 3. 0.]
# [ 44444. 1. 0. 3. 0.]]
Alternatively, we could use another distribution, such as the Normal. This overwrites the previously assigned effects.
ae_table = trait.construct_allele_effects_table(alleles, qtl, random.normalvariate, 0, 1)
print(ae_table[qtl]) # qtl only
# [[ 6. 1. 0.09821 3. 0.19477]
# [ 2972. 1. -1.48559 2. 1.47764]
# [12694. 1. -1.16001 3. 0.09613]
# [30642. 1. 0.44827 3. 0.11772]
# [34123. 1. 2.15811 3. 0.99274]]
For speed of computation we construct an array of allele effects where the row of the array corresponds to the locus and the column corresponds to the integer representing the allele state.
ae_array = trait.construct_ae_array(ae_table, qtl)
print(ae_array[qtl])
# [[ 0. 0.09821 0. 0.19477 0. 0. ]
# [ 0. -1.48559 1.47764 0. 0. 0. ]
# [ 0. -1.16001 0. 0.09613 0. 0. ]
# [ 0. 0.44827 0. 0.11772 0. 0. ]
# [ 0. 2.15811 0. 0.99274 0. 0. ]]
Definition of g¶
g is the sum of the allele effects of an individual’s genotype. There is
no noise or error in g because we have a priori determined the allele
effects.
operators.calculate_g(example_pop, ae_array)
print(np.array(example_pop.indInfo('g')))
# [ 3.7728 5.66723 0.90614 ... 6.83259]
Calculation of Error Term¶
To simulate the experimental noise a term \(\epsilon\) is added to each
individual’s a value.
\(\epsilon\) is a random variable with a normal distribution given by
mean \(0\) and variance given by:
where \(\sigma^2_a\) is the variance of a and \(h^2\) is the
narrow sense heritability.
Hence, an individual’s value of p is calculated by
Calculating p¶
It is straightforward to calculate p for the population but we already
have a function to make it even easier for ourselves.
p for each individual¶heritability = 0.7
operators.calculate_error_variance(example_pop, heritability)
operators.calculate_p(example_pop)
print(np.array(example_pop.indInfo('p')))
# [ 4.70345 8.28645 0.7787 ... 6.9911 ]
Validating the h2 Function¶
Becuase \(\epsilon\) is a random variable, we will compute median h2 from 30 replications (median b/c h2 is bounded)
h2¶check_h2 = []
for x in range(0, 30):
ae_table = trait.construct_allele_effects_table(alleles, qtl, random.normalvariate, 0, 1)
ae_array = trait.construct_ae_array(ae_table, qtl)
operators.calculate_g(example_pop, ae_array)
operators.calculate_error_variance(example_pop, heritability)
operators.calculate_p(example_pop)
check_h2.append(np.var(example_pop.indInfo('g')) / np.var(example_pop.indInfo('p')))
check_h2
# [0.8594594061513856, 0.547350607012552, 0.8588487371957536, ..., 0.4640558795605753]
np.median(check_h2_v2)
# 0.6260138213421671
Validating the calculate_g Function¶
Let’s make sure that our function is correctly matching allele to its effect and
summing the effects correctly. We will look at the alleles individual 1 of
example_pop at the QTL. Then we will sum the effects and compare the result
with our function calculate_g().
g¶example_ind = example_pop.individual(0)
alpha_qtl_alleles = np.array(example_ind.genotype(ploidy=0))[qtl]
omega_qtl_alleles = np.array(example_ind.genotype(ploidy=1))[qtl]
example_g = [[], []]
for locus, alpha, omega in zip(qtl, alpha_qtl_alleles, omega_qtl_alleles):
print(locus, alpha, ae_array[locus, alpha], omega, ae_array[locus, omega])
example_g[0].append(ae_array[locus, alpha])
example_g[1].append(ae_array[locus, omega])
sum(example_g[0]) + sum(example_g[1])
# 0.8047750628903483
example_pop.indByID(1).g
# 0.8047750628903477