Integrated Example: Multiple Generations and Data Storage

This example will combine the what we have learned in prior examples. We will create a multi-parental population, perform several generations of random mating, calculate an additive trait at each generation, store the allele frequencies, genotype frequencies and additive trait.

Module imports
import simuOpt
simuOpt.setOptions(alleleType='short', numThreads=4, quiet=True)
import simuPOP as sim
import pandas as pd
import numpy as np
import random
from saegus import analyze, breed, operators, parameters, parse
import h5py
np.set_printoptions(precision=3, suppress=True)

Creating the Multi-Parental Population

Load Population and Genetic Map

As usual we start by re-loading the example population we have been working with in all other tutorials.

Loading the population and genetic map
example_pop = sim.loadPopulation('example_pop.pop')
example_pop.addInfoFields(['ind_id', 'mother_id', 'father_id', 'g', 'p'])
sim.tagID(example_pop)
tf = parse.TusonFounders()
recom_map = tf.parse_recombination_rates('genetic_map.txt')

Achieving Recombinatorial Convergence

Creating the F1

As in the Creating Multi-Parental Populations tutorial we will create the multi-parental population from individuals ‘’1’’ through ‘’8’’ of example_pop. This requires three generations of mating to achieve recombinatorial convergence. We will create a population of size 1000.

Repeating the MAGIC-like protocol
founders = [[1, 2], [3, 4], [5, 6], [7, 8]]
offspring_per_pair = 250
magic = breed.MAGIC(example_pop, founders, recom_map)
example_pop.popSize()
# 105
magic.generate_f_one(founders, offspring_per_pair)

First Converging Cross

Second step of recombinatorial convergence
first_random_cross = breed.RandomCross(example_pop, 4, 250)
first_mothers, first_fathers = first_random_cross.converging_random_cross()
first_parent_chooser = breed.SecondOrderPairIDChooser(first_mothers, first_fathers)
example_pop.evolve(
    matingScheme=sim.HomoMating(
        sim.PyParentsChooser(first_parent_chooser.snd_ord_id_pairs),
        sim.OffspringGenerator(ops=[
            sim.IdTagger(),
            sim.PedigreeTagger(),
            sim.Recombinator(rates=recom_map)],
            numOffspring=1),
        subPopSize=1000),
    gen=1,
    )
# 1

Second Converging Cross

Second step of recombinatorial convergence
final_random_cross = breed.RandomCross(example_pop, 2, 500)
final_mothers, final_fathers = final_random_cross.converging_random_cross()
final_parent_chooser = breed.SecondOrderPairIDChooser(final_mothers, final_fathers)
example_pop.evolve(
    matingScheme=sim.HomoMating(
        sim.PyParentsChooser(final_parent_chooser.snd_ord_id_pairs),
        sim.OffspringGenerator(ops=[
            sim.IdTagger(),
            sim.PedigreeTagger(),
            sim.Recombinator(rates=recom_map)],
            numOffspring=1),
        subPopSize=1000),
    gen=1,
    )
# 1

Additive Trait

We will choose 20 loci to declare as quantitative trait loci with exponentially distributed allele effects with mean equal to 1.

\[G \sim Exp(1)\]

We will use the same process in Additive Trait Parameterization.

Choosing QTL and assigning effects
sim.stat(example_pop, alleleFreq=sim.ALL_AVAIL)
segregating_loci = sim.stat(example_pop, numOfSegSites=sim.ALL_AVAIL, vars=['segSites'])
qtl = sorted(random.sample(example_pop.dvars().segSites, 20))
example_run = analyze.Study('example_pop')
allele_states = example_run.gather_allele_data(example_pop)
alleles = np.array([allele_states[:, 1], allele_states[:, 2]]).T
trait = parameters.Trait()
ae_table = trait.construct_allele_effects_table(alleles, qtl, random.expovariate, 1)
ae_array = trait.construct_ae_array(ae_table, qtl)
print(ae_array[qtl])

Opening the HDF5 File and Declaring Groups

All of the data derived from the simulation will be stored in a single HDF5 file. Each type of data will have a separate h5py.Group. HDF5 groups make it very easy to split data into categories.

Set up the HDF5 File
integrated_example_data = h5py.File('integrated_example_data.hdf5')
allele_group = integrated_example_data.create_group('allele')
genotype_group = integrated_example_data.create_group('genotype')
trait_group = integrated_example_data.create_group('trait')

Ten Generations of Random Mating

This example will simulate ten generations of random mating with a population size of 1000.

Operator Forms for Storing Data from Each Generation

Just as simuPOP has function forms of its operators. saegus has operator forms of its functions. There are operators that collect each type of data and store it in an HDF5 file.

Allele Frequencies, Genotype Frequencies, g and p

Each kind of data is stored by generation as specified in the data model. The operators in sim.evolve each take an h5py.Group and acquire the generation from the Population. These two items are enough to specify a unique address for the data inside the HDF5 file. The data are stored in the h5py.File generation by generation.

Creating the allele data and frequency arrays
operators.calculate_g(example_pop, ae_array)
print(np.array(example_pop.indInfo('g')))
operators.calculate_error_variance(example_pop, 0.7)
example_pop.dvars().epsilon
operators.calculate_p(example_pop)
np.mean(example_pop.indInfo('g')), np.var(example_pop.indInfo('g'))
np.mean(example_pop.indInfo('p')), np.var(example_pop.indInfo('p'))
g = np.array(example_pop.indInfo('g'))
example_pop.dvars().gen = 1
Storing ten generations of data
example_pop.evolve(
preOps=[
    sim.Stat(alleleFreq=sim.ALL_AVAIL), #calculate allele frequencies
    sim.Stat(genoFreq=sim.ALL_AVAIL), # calculate genotype frequencies
    operators.HDF5AlleleFrequencies(allele_group, allele_states), #store allele frequencies
    operators.HDF5GenotypeFrequencies(genotype_group), # store genotype frequencies
    operators.GenoAdditiveArray(qtl, ae_array), # calculate g
    operators.PhenoAdditive(), # calulcate p
    operators.HDFTrait('g', trait_group), # store g
    operators.HDFTrait('p', trait_group), # store p
       ],
    matingScheme=sim.RandomMating(ops=[
        sim.IdTagger(),
        sim.PedigreeTagger(),
        sim.Recombinator(rates=recom_rates)],
        subPopSize=1000),
    finalOps=operators.HDF5Close(),
    gen=10,
    )
# 10

The data for ten generations of random mating with heritability = 0.7 is stored in integrated_example_data.hdf5. There is a brief tutorial on accessing the data in Collect and Store Data for both h5py in Python and h5 in R.