Expanding A Population According to Structure Matrix¶
This example uses simuPOP’s capability to perform non-random
mating schemes We have inferred the population structure of example_pop.
The goal of this example is to use the population structure to create a
simulated population with similar structure to the original population. The
genome of each individual in example_pop derives from a single
sub-population or a combination of sub-populations.
population_structure_matrix.txt defines the proportion of the genome
inherited by an individual from each sub-population. For our example: each
individual is assigned a primary sub-population. The primary sub-population
is the sub-population from which the individual inherited the largest
proportion of their genome. The proportions of inheritance are interpreted as
probabilities for determining which sub-population the mate will derive from.
import simuOpt
simuOpt.setOptions(alleleType='short', quiet=True)
import simuPOP as sim
import numpy as np, pandas as pd
import collections as col
from scipy import stats
from saegus import parameters, breed, parse
np.set_printoptions(suppress=True, precision=3)
We will continue to use the same population as the rest of our examples.
example_pop.pop¶example_pop = sim.loadPopulation('example_pop.pop')
We will add information fields to the population so that we can track the pedigree. If we wanted to analyze the pedigree we could look at the mother and father of each individual.
example_pop.addInfoFields(['ind_id', 'mother_id', 'father_id', 'primary'])
sim.tagID(example_pop)
example_pop.indInfo('ind_id')
# (1.0, 2.0, 3.0, 4.0, 5.0, ... 105.0)
We will import a file that tells us the likely mating structure of each of the 105 individuals of our population.
structure_matrix = pd.read_csv('example_population_structure_matrix.txt', index_col=0)
popst = parameters.PopulationStructure(example_pop)
proportions = np.array(np.array(structure_matrix)[:, 1:7])
print(proportions)
# [[0.0, 0.9996, 0.0, 0.0004, 0.0, 0.0],
# [0.011000000000000001, 0.0015, 0.0004, 0.1047, 0.0, 0.8824],
# [0.0, 0.3832, 0.0, 0.6168, 0.0, 0.0],
# ...
# [0.0, 0.4602, 0.0, 0.5398, 0.0, 0.0]]
There is a very small amount of rounding error in the proportions for some
individuals. If the proportions do not sum to 1 then we cannot use
them to make a probability mass function. For example:
proportions[33]
# array([0.8856999999999998, 0.0016, 0.0009, 0.1065, 0.0042, 0.0011], dtype=object)
sum(proportions[33])
# 1.0000000000000002
So we will use a function to adjust the small difference from 1 by adding or
subtracting from the primary sub-population proportion.
corrected_proportions = popst.correct_rounding_error(proportions)
sum(corrected_proportions[33])
# 0.9999999999999999
Apparently the result of 0.9999999999999999 is close enough for the
scipy.stats module we are about to use. For peace of mind, we can use the
name attribute of the stats.rv_discrete function to match the ind_id
with the corresponding probabilities.
mating_pmfs = {}
for i, ind in enumerate(example_pop.individuals()):
mating_pmfs[ind.ind_id] = stats.rv_discrete(values=([0.0, 1.0, 2.0, 3.0, 4.0, 5.0],
corrected_proportions[i]), name=str(ind.ind_id))
example_pop.dvars().mating_probabilities = mating_pmfs
Validating the Mating Probabilities¶
Before we proceed we should check the empirical distributions of the probability mass functions. We will use an example individual who is quite diverse in its lineage.
corrected_proportions[5]
# array([0.2195, 0.0198, 0.021, 0.2371, 0.1295, 0.3731], dtype=object)
mating_pmfs[6].pk # corresponding mating pmf
# array([0.2195, 0.0198, 0.021, 0.2371, 0.1295, 0.3731], dtype=object)
mating_pmfs[6].name
# 6.0
This individual is composed from all six sub-populations. We will draw 1000 times from the corresponding probability mass function and compare the results.
draw_results = mating_pmfs[6].rvs(size=1000)
draw_results
# array([4, 3, 5, 3, 3, ... 4])
draw_counts = col.Counter(draw_results)
draw_frequencies = []
for sp in range(6):
draw_frequencies.append(draw_counts[sp]/1000)
Finally let’s compare the 1000 draws with the probabilities.
draw_frequencies
# [0.219, 0.017, 0.021, 0.223, 0.148, 0.372]
corrected_proportions[5]
# array([0.2195, 0.0198, 0.021, 0.2371, 0.1295, 0.3731], dtype=object)
The draw frequencies are pretty close to the probability mass function. If we increased the number of draws to 10,000 the differences would become even smaller.
Assigning Primary Subpopulations¶
We will continue by assigning each individual a primary sub-population. The primary sub-population is the sub-population from which the majority of their genome is derived.
primary_subpops = {ind.ind_id: float(np.argmax(corrected_proportions[i]))
for i, ind in enumerate(example_pop.individuals())}
for ind in example_pop.individuals():
ind.primary = primary_subpops[ind.ind_id]
example_pop.indInfo('primary')
# (1.0, 5.0, 3.0, ..., 3.0)
Then we will use the virtual sub-population feature of simuPOP to group the
individuals without restricting mating between groups.
example_pop into virtual sub-populations¶primary_subpopulation_splitter = sim.InfoSplitter(field='primary', values=[0.0, 1.0, 2.0, 3.0, 4.0, 5.0])
example_pop.setVirtualSplitter(primary_subpopulation_splitter)
Parent Chooser and Recombination Map¶
The class containing the parent chooser function must be instantiated with the
expanded population size. The recombination map will be parsed with an older
function. We will explain in a later section more details about recombination
in simuPOP.
popst_parent_chooser = breed.ForcedPopulationStructureParentChooser(1000, mating_pmfs)
tf = parse.TusonFounders()
recom_rates = tf.parse_recombination_rates('genetic_map.txt')
recom_rates
# [0.0020926625899999962, 2.2615580000007186e-05, 0.00042822784999999361, ..., 0.0]
Expanding the Population¶
Finally we can expand the population via mating according to the population structure probability mass functions. Each mating event follows this process:
- Randomly draw the first parent
- Given the mating probability mass function of the first parent: draw the second parent from the probability mass function of the first parent
- Cross the two parents
This procedure is repeated 1,000 times because each mating event produces a single offspring.
1000 individuals¶example_pop.evolve(
matingScheme=sim.HomoMating(
sim.PyParentsChooser(popst_parent_chooser.forced_structure_parent_chooser),
sim.OffspringGenerator(
ops=[sim.IdTagger(), sim.PedigreeTagger(), sim.Recombinator(recom_rates)],
numOffspring=1),
subPopSize=1000
),
gen=1
)
If we wanted to analyze the specific crosses we can create a pedigree using
the ind_id, mother_id and father_id fields.
pedigree = np.array((example_pop.indInfo('ind_id'),
example_pop.indInfo('mother_id'),
example_pop.indInfo('father_id'))).T
print(pedigree)
# [[ 106., 45., 86.],
# [ 107., 26., 70.],
# [ 108., 60., 31.],
# ...,
# [ 1105., 39., 40.]]