dbSimulate {DNAtools} | R Documentation |
Simulate a DNA database
Description
Simulates a DNA database given a set of allele probabilities and theta value. It is possible to have close relatives in the database simulated in pairs, such that within each pair the profiles are higher correlated due to close familial relationship, but between pairs of profiles the correlation is only modelled by theta.
Usage
dbSimulate(probs, theta = 0, n = 1000, relatives = NULL)
Arguments
probs |
List of allele probabilities, where each element in the list is a vector of allele probabilities. |
theta |
The coancestry coefficient |
n |
The number of profiles in the database |
relatives |
A vector of length 4. Determining the number of PAIRS of profiles in the database: (FULL-SIBLINGS, FIRST-COUSINS, PARENT-CHILD, AVUNCULAR). They should obey that 2*sum(relatives)<=n. |
Details
Simulates a DNA database with a given number of DNA profiles (and possibly relatives) with a correlation between profiles governed by theta.
Value
A data frame where each row represents a DNA profile. The first column is a profile identifier (id) and the next 2*L columns contains the simulated genotype for each of the L loci. L is determined by the length of the list 'probs' with allele probabilities
Author(s)
James Curran and Torben Tvedebrink
Examples
## Not run:
## Simulate some allele frequencies:
freq <- replicate(10, { g = rgamma(n=10,scale=4,shape=3); g/sum(g)},
simplify=FALSE)
## Simulate a single database with 5000 DNA profiles:
simdb <- dbSimulate(freq,theta=0,n=5000)
## Simulate a number of databases, say N=50. For each database compute
## the summary statistic using dbCompare:
N <- 50
Msummary <- matrix(0,N,(length(freq)+1)*(length(freq)+2)/2)
for(i in 1:N)
Msummary[i,] <- dbCompare(dbSimulate(freq,theta=0,n=1000),
vector=TRUE,trace=FALSE)$m
## Give the columns representative names:
dimnames(Msummary)[[2]] <- DNAtools:::dbCats(length(freq),vector=TRUE)
## Plot the simulations using a boxplot
boxplot(log10(Msummary))
## There might come some warnings due to taking log10 to zero-values (no counts)
## Add the expected number to the plot:
points(1:ncol(Msummary),log10(dbExpect(freq,theta=0,n=1000,vector=TRUE)),
col=2,pch=16)
## End(Not run)