PedCompare {sequoia}R Documentation

Compare Two Pedigrees

Description

Compare an inferred pedigree (Ped2) to a previous or simulated pedigree (Ped1), including comparison of sibship clusters and sibship grandparents.

Usage

PedCompare(
  Ped1 = NULL,
  Ped2 = NULL,
  DumPrefix = c("F0", "M0"),
  SNPd = NULL,
  Symmetrical = TRUE,
  minSibSize = "1sib1GP",
  Plot = TRUE
)

Arguments

Ped1

first (e.g. original) pedigree, dataframe with columns id-dam-sire; only the first 3 columns will be used.

Ped2

second pedigree, e.g. newly inferred SeqOUT$Pedigree or SeqOUT$PedigreePar, with columns id-dam-sire.

DumPrefix

character vector with the prefixes identifying dummy individuals in Ped2. Use 'F0' ('M0') to avoid matching to regular individuals with IDs starting with 'F' ('M'), provided Ped2 has fewer than 999 dummy females (males).

SNPd

character vector with IDs of genotyped individuals. If NULL, defaults to the IDs occurring in both Ped1 and Ped2 and not starting with any of the prefixes in DumPrefix.

Symmetrical

when determining the category of individuals (Genotyped/Dummy/X), use the 'highest' category across the two pedigrees (TRUE, default) or only consider Ped1 (Symmetrical = FALSE).

minSibSize

minimum requirements to be considered 'dummifiable', passed to getAssignCat:

  • '1sib' : sibship of size 1, with or without grandparents. The latter aren't really a sibship, but can be useful in some situations.

  • '1sib1GP': sibship of size 1 with at least 1 grandparent (default)

  • '2sib': at least 2 siblings, with or without grandparents (default prior to version 2.4)

Plot

show square Venn diagrams of counts?

Details

The comparison is divided into different classes of ‘assignable’ parents (getAssignCat). This includes cases where the focal individual and parent according to Ped1 are both Genotyped (G-G), as well as cases where the non-genotyped parent according to Ped1 can be lined up with a sibship Dummy parent in Ped2 (G-D), or where the non-genotyped focal individual in Ped1 can be matched to a dummy individual in Ped2 (D-G and D-D). If SNPd is NULL (the default), and DumPrefix is set to NULL, the intersect between the IDs in Pedigrees 1 and 2 is taken as the vector of genotyped individuals.

Value

A list with

Counts

A 7 x 5 x 2 named numeric array with the number of matches and mismatches, see below

Counts.detail

a large numeric array with number of matches and mismatches, with more detail for all possible combination of categories

MergedPed

A dataframe with side-by-side comparison of the two pedigrees

ConsensusPed

A consensus pedigree, with Pedigree 2 taking priority over Pedigree 1

DummyMatch

Dataframe with all dummy IDs in Pedigree 2 (id.2), and the best-matching individual in Pedigree 1 (id.1). Also includes the class of the dam & sire, as well as counts of offspring per outcome class (off.Match, off.Mismatch, etc.)

Mismatch

A subset of MergedPed with mismatches between Ped1 and Ped2, as defined below

Ped1only

as Mismatches, with parents in Ped1 that were not assigned in Ped2

Ped2only

as Mismatches, with parents in Ped2 that were missing in Ped1

'MergedPed', 'Mismatch', 'Ped1only' and 'Ped2only' provide the following columns:

id

All ids in both Pedigree 1 and 2. For dummy individuals, this is the id in pedigree 2

dam.1, sire.1

parents in Pedigree 1

dam.2, sire.2

parents in Pedigree 2

id.r, dam.r, sire.r

The real id of dummy individuals or parents in Pedigree 2, i.e. the best-matching non-genotyped individual in Pedigree 1, or "nomatch". If a sibship in Pedigree 1 is divided over 2 sibships in Pedigree 2, the smaller one will be denoted as "nomatch"

id.dam.cat, id.sire.cat

the category of the individual (first letter) and highest category of the dam (sire) in Pedigree 1 or 2: G=Genotyped, D=(potential) dummy, X=none. Individual, one-letter categories are generated by getAssignCat. Using the 'best' category from both pedigrees makes comparison between two inferred pedigrees symmetrical and more intuitive.

dam.class, sire.class

classification of dam and sire: Match, Mismatch, P1only, P2only, or '_' when no parent is assigned in either pedigree

The first dimension of Counts denotes the following categories:

GG

Genotyped individual, assigned a genotyped parent in either pedigree

GD

Genotyped individual, assigned a dummy parent, or at least 1 genotyped sibling or a genotyped grandparent in Pedigree 1)

GT

Genotyped individual, total

DG

Dummy individual, assigned a genotyped parent (i.e., grandparent of the sibship in Pedigree 2)

DD

Dummy individual, assigned a dummy parent (i.e., avuncular relationship between sibships in Pedigree 2)

DT

Dummy total

TT

Total total, includes all genotyped individuals, plus non-genotyped individuals in Pedigree 1, plus non-replaced dummy individuals (see below) in Pedigree 2

The second dimension of Counts gives the outcomes:

Total

The total number of individuals with a parent assigned in either or both pedigrees

Match

The same parent is assigned in both pedigrees (non-missing). For dummy parents, it is considered a match if the inferred sibship which contains the most offspring of a non-genotyped parent, consists for more than half of this individual's offspring.

Mismatch

Different parents assigned in the two pedigrees. When a sibship according to Pedigree 1 is split over two sibships in Pedigree 2, the smaller fraction is included in the count here.

P1only

Parent in Pedigree 1 but not 2; includes non-assignable parents (e.g. not genotyped and no genotyped offspring).

P2only

Parent in Pedigree 2 but not 1.

The third dimension Counts separates between maternal and paternal assignments, where e.g. paternal 'DT' is the assignment of fathers to both maternal and paternal sibships (i.e., to dummies of both sexes).

In 'ConsensusPed', the priority used is parent.r (if not "nomatch") > parent.2 > parent.1. The columns 'id.cat', dam.cat' and 'sire.cat' have two additional levels compared to 'MergedPed':

G

Genotyped

D

Dummy individual (in Pedigree 2)

R

Dummy individual in pedigree 2 replaced by best matching non-genotyped individual in pedigree 1

U

Ungenotyped, Unconfirmed (parent in Pedigree 1, with no dummy match in Pedigree 2)

X

No parent in either pedigree

Assignable

Note that 'assignable' may be overly optimistic. Some parents from Ped1 indicated as assignable may never be assigned by sequoia, for example parent-offspring pairs where it cannot be determined which is the older of the two, or grandparents that are indistinguishable from full avuncular (i.e. genetics inconclusive because the candidate has no parent assigned, and ageprior inconclusive).

Dummifiable

Considered as potential dummy individuals are all non-genotyped individuals in Pedigree 1 who have, according to either pedigree, at least 2 genotyped offspring, or at least one genotyped offspring and a genotyped parent.

Mismatches

Perhaps unexpectedly, cases where all siblings are correct but a dummy parent rather than the genotyped Ped1-parent are assigned, are classified as a mismatch (for each of the siblings). These are typically due to a too low assumed genotyping error rate, a wrong parental birth year, or some other issue that requires user inspection. To identify these cases, ComparePairs may be of help.

Genotyped 'mystery samples'

If Pedigree 2 includes samples for which the ID is unknown, the behaviour of PedCompare depends on whether the temporary IDs for these samples are included in SNPd. If they are included, matching (actual) IDs in Pedigree 1 will be flagged as mismatches (because the IDs differ). If they are not included in SNPd, or SNPd is not explicitly provided, matches are accepted, as the situation is indistinguishable from comparing dummy parents across pedigrees.

This is of course all conditional on relatives of the mystery sample being assigned in Pedigree 2.

Author(s)

Jisca Huisman, jisca.huisman@gmail.com

See Also

ComparePairs for comparison of all pairwise relationships in 2 pedigrees; EstConf for repeated simulate-reconstruct-compare; getAssignCat for all parents in the reference pedigree that could have been assigned; CalcOHLLR to check how well an 'old' pedigree fits with the SNP data.

Examples

compare <- PedCompare(Ped_griffin, SeqOUT_griffin$Pedigree)
compare$Counts["TT",,]  # totals only; 45 dams & 47 sires non-assigned
compare$Counts[,,"dam"]  # dams only

# inspect non-assigned in Ped2, id genotyped, dam might-be-dummy
PedM <- compare$MergedPed  # for brevity
PedM[PedM$id.dam.cat=='GD' & PedM$dam.class=='P1only',]
# zoom in on specific dam
PedM[which(PedM$dam.1=="i011_2001_F"), ]
# no sire for 'i034_2002_F' -> impossible to tell if half-sibs or avuncular

# overview of all non-genotyped -- dummy matches
head(compare$DummyMatch)

# success of paternity assignment, if genotyped mother correctly assigned
dimnames(compare$Counts.detail)
compare$Counts.detail["G","G",,"Match",]

# default before version 3.5: minSibSize = '2sib'
compare_2s <- PedCompare(Ped_griffin, SeqOUT_griffin$Pedigree,
                         minSibSize = '2sib')
compare_2s$Counts[,,"dam"]  # note decrease in Total 'dummies
with(compare_2s$MergedPed, table(id.dam.cat, dam.class))
# some with id.cat = 'X' or dam.cat='X' are nonetheless dam.class='Match'

[Package sequoia version 2.11.2 Index]