recomb_last_gen {simfam}R Documentation

Draw recombination breaks for autosomes for last generation of a pedigree

Description

A wrapper around the more general recomb_fam(), specialized to save memory when only the last generation is desired (recomb_fam() returns recombination blocks for the entire pedigree). This function assumes that generations are non-overlapping (met by the output of sim_pedigree()), in which case each generation g can be drawn from generation g-1 data only. That way, only two consecutive generations need be in memory at any given time. The partitioning of individuals into generations is given by the ids parameter (again matches the output of sim_pedigree()).

Usage

recomb_last_gen(founders, fam, ids, missing_vals = c("", 0))

Arguments

founders

The named list of founders with their chromosomes. For unstructured founders, initialize with recomb_init_founders(). Each element of this list is a diploid individual, which is a list with two haploid individuals named pat and mat, each of which is a list of chromosomes (always identified by number, but may also be named arbitrarily), each of which is a data.frame/tibble with implicit ranges (posg is end coordinates in cM; start is the end of the previous block, zero for the first block) and ancestors anc as strings. For true founders each chromosome may be trivial (each chromosome is a single block with ID equal to itself but distinguishing maternal from paternal copy), but input itself can be recombined (for iterating). This list must have names that identify each founder (matching codes in fam$id). Individuals may be in a different order than fam$id. Extra individuals in founders but absent in fam$id will be silently ignored.

fam

The pedigree data.frame, in plink FAM format. Only columns id, pat, and mat are required. id must be unique and non-missing. Founders must be present, and their pat and mat values must be missing (see below). Non-founders must have both their parents be non-missing. Parents must appear earlier than their children in the table.

ids

A list containing vectors of IDs for each generation. All these IDs must be present in fam$id. If IDs in fam and ids do not fully agree, the code processes the IDs in the intersection, which is helpful when fam is pruned but ids is the original (larger) set.

missing_vals

The list of ID values treated as missing. NA is always treated as missing. By default, the empty string (”) and zero (0) are also treated as missing (remove values from here if this is a problem).

Value

The list of individuals with recombined chromosomes of the last generation (the intersection of ids[ length(ids) ] and fam$id), in the same format as founders above. The names of this list are last-generation individuals in the order that they appear in fam$id.

See Also

Plink FAM format reference: https://www.cog-genomics.org/plink/1.9/formats#fam

Examples

# A small pedigree, two parents and two children.
# A minimal fam table with the three required columns.
# Note "mother" and "father" have missing parent IDs, while children do not
library(tibble)
fam <- tibble(
  id = c('father', 'mother', 'child', 'sib'),
  pat = c(NA, NA, 'father', 'father'),
  mat = c(NA, NA, 'mother', 'mother')
)
# need an `ids` list separating the generations
ids <- list( c('father', 'mother'), c('child', 'sib') )

# initialize parents with this other function
# simulate three chromosomes with these lengths in cM
lengs <- c(50, 100, 150)
founders <- recomb_init_founders( ids[[1]], lengs )

# draw recombination breaks for the children
inds <- recomb_last_gen( founders, fam, ids )


[Package simfam version 1.1.6 Index]