Integrated Example: Multiple Generations and Data Storage¶
This example will combine the what we have learned in prior examples. We will create a multi-parental population, perform several generations of random mating, calculate an additive trait at each generation, store the allele frequencies, genotype frequencies and additive trait.
import simuOpt
simuOpt.setOptions(alleleType='short', numThreads=4, quiet=True)
import simuPOP as sim
import pandas as pd
import numpy as np
import random
from saegus import analyze, breed, operators, parameters, parse
import h5py
np.set_printoptions(precision=3, suppress=True)
Creating the Multi-Parental Population¶
Load Population and Genetic Map¶
As usual we start by re-loading the example population we have been working with in all other tutorials.
example_pop = sim.loadPopulation('example_pop.pop')
example_pop.addInfoFields(['ind_id', 'mother_id', 'father_id', 'g', 'p'])
sim.tagID(example_pop)
tf = parse.TusonFounders()
recom_map = tf.parse_recombination_rates('genetic_map.txt')
Achieving Recombinatorial Convergence¶
Creating the F1¶
As in the Creating Multi-Parental Populations tutorial we will create
the multi-parental population from individuals ‘’1’’ through ‘’8’’ of
example_pop. This requires three generations of mating to achieve
recombinatorial convergence. We will create a population of size 1000.
founders = [[1, 2], [3, 4], [5, 6], [7, 8]]
offspring_per_pair = 250
magic = breed.MAGIC(example_pop, founders, recom_map)
example_pop.popSize()
# 105
magic.generate_f_one(founders, offspring_per_pair)
First Converging Cross¶
first_random_cross = breed.RandomCross(example_pop, 4, 250)
first_mothers, first_fathers = first_random_cross.converging_random_cross()
first_parent_chooser = breed.SecondOrderPairIDChooser(first_mothers, first_fathers)
example_pop.evolve(
matingScheme=sim.HomoMating(
sim.PyParentsChooser(first_parent_chooser.snd_ord_id_pairs),
sim.OffspringGenerator(ops=[
sim.IdTagger(),
sim.PedigreeTagger(),
sim.Recombinator(rates=recom_map)],
numOffspring=1),
subPopSize=1000),
gen=1,
)
# 1
Second Converging Cross¶
final_random_cross = breed.RandomCross(example_pop, 2, 500)
final_mothers, final_fathers = final_random_cross.converging_random_cross()
final_parent_chooser = breed.SecondOrderPairIDChooser(final_mothers, final_fathers)
example_pop.evolve(
matingScheme=sim.HomoMating(
sim.PyParentsChooser(final_parent_chooser.snd_ord_id_pairs),
sim.OffspringGenerator(ops=[
sim.IdTagger(),
sim.PedigreeTagger(),
sim.Recombinator(rates=recom_map)],
numOffspring=1),
subPopSize=1000),
gen=1,
)
# 1
Additive Trait¶
We will choose 20 loci to declare as quantitative trait loci with
exponentially distributed allele effects with mean equal to 1.
We will use the same process in Additive Trait Parameterization.
sim.stat(example_pop, alleleFreq=sim.ALL_AVAIL)
segregating_loci = sim.stat(example_pop, numOfSegSites=sim.ALL_AVAIL, vars=['segSites'])
qtl = sorted(random.sample(example_pop.dvars().segSites, 20))
example_run = analyze.Study('example_pop')
allele_states = example_run.gather_allele_data(example_pop)
alleles = np.array([allele_states[:, 1], allele_states[:, 2]]).T
trait = parameters.Trait()
ae_table = trait.construct_allele_effects_table(alleles, qtl, random.expovariate, 1)
ae_array = trait.construct_ae_array(ae_table, qtl)
print(ae_array[qtl])
Opening the HDF5 File and Declaring Groups¶
All of the data derived from the simulation will be stored in a single HDF5
file. Each type of data will have a separate h5py.Group. HDF5
groups make it very easy to split data into categories.
integrated_example_data = h5py.File('integrated_example_data.hdf5')
allele_group = integrated_example_data.create_group('allele')
genotype_group = integrated_example_data.create_group('genotype')
trait_group = integrated_example_data.create_group('trait')
Ten Generations of Random Mating¶
This example will simulate ten generations of random mating with a population
size of 1000.
Operator Forms for Storing Data from Each Generation¶
Just as simuPOP has function forms of its operators. saegus
has operator forms of its functions. There are operators that collect each
type of data and store it in an HDF5 file.
Allele Frequencies, Genotype Frequencies, g and p¶
Each kind of data is stored by generation as specified in the data model.
The operators in sim.evolve each take an h5py.Group and acquire
the generation from the Population. These two items are enough
to specify a unique address for the data inside the HDF5 file. The data
are stored in the h5py.File generation by generation.
operators.calculate_g(example_pop, ae_array)
print(np.array(example_pop.indInfo('g')))
operators.calculate_error_variance(example_pop, 0.7)
example_pop.dvars().epsilon
operators.calculate_p(example_pop)
np.mean(example_pop.indInfo('g')), np.var(example_pop.indInfo('g'))
np.mean(example_pop.indInfo('p')), np.var(example_pop.indInfo('p'))
g = np.array(example_pop.indInfo('g'))
example_pop.dvars().gen = 1
example_pop.evolve(
preOps=[
sim.Stat(alleleFreq=sim.ALL_AVAIL), #calculate allele frequencies
sim.Stat(genoFreq=sim.ALL_AVAIL), # calculate genotype frequencies
operators.HDF5AlleleFrequencies(allele_group, allele_states), #store allele frequencies
operators.HDF5GenotypeFrequencies(genotype_group), # store genotype frequencies
operators.GenoAdditiveArray(qtl, ae_array), # calculate g
operators.PhenoAdditive(), # calulcate p
operators.HDFTrait('g', trait_group), # store g
operators.HDFTrait('p', trait_group), # store p
],
matingScheme=sim.RandomMating(ops=[
sim.IdTagger(),
sim.PedigreeTagger(),
sim.Recombinator(rates=recom_rates)],
subPopSize=1000),
finalOps=operators.HDF5Close(),
gen=10,
)
# 10
The data for ten generations of random mating with heritability = 0.7 is stored
in integrated_example_data.hdf5. There is a brief tutorial on accessing the
data in Collect and Store Data for both h5py in Python
and h5 in R.