MMD_attr {MMD} | R Documentation |
Attribution of individuals to sources using the MMD method
Description
Attribution of individuals to sources using the MMD method
Usage
MMD_attr(
datafile,
popfile,
NL,
sourcenames,
ToAttribute,
SelfA = "no",
fSelfA = 0.5,
randomSelfA = "yes",
quantile = 0.01,
optq = "no",
pqmin = 0,
pqmax = 0.5,
np = 20,
Nbootstrap = 10000,
verbose = FALSE
)
Arguments
datafile |
character; Name of the file *.csv (with full path in the file system) containing the genotypes (features) of individuals. |
popfile |
character; Name of the file *.pop (with full path in the file system) containing the genotypes (features) of individuals. |
NL |
integer; number of loci. If larger than the number of available loci in the data set, NL is reduced to the maximum available number of loci. |
sourcenames |
a character vector listing the names of the sources. |
ToAttribute |
character giving the name of the individuals of aknown origin (i.e. those that will be attributed to source). |
SelfA |
character; if "no" attribution of individuals to sources is made; if "yes", self-attribution of selected individuals from sources is made. (Default "no") |
fSelfA |
real number in the interval (0,1). When SelfA="yes", fSelfA specifies the fraction of individuals from the source specified by ToAttribute that will be assumed to be of unknown origin. (Default 0.1) |
randomSelfA |
character only relevant if SelfA="yes". If "yes", individuals to be considered as unknown are randomly selected from the source specified by ToAttribute; if "no" a list of names for individuals is read from filepoplist. (Default "yes") |
quantile |
real number with values in (0,1) giving the q-quantile for the MMD method. Only used if the quantile is not obatined through optimisation of the probability of correct self-attribution. (Default 0.01) |
optq |
character; if "no", the specified quantile value is used; if "yes", the q-quantile is optimised (only meaningful for self-attribution so optq="no" automatically if SelfA="no"). (Default "no") |
pqmin |
real number with values in (0,1); minimum value of q-quantile when optq="yes". (Default 0) |
pqmax |
real number with values in (0,1); maximum value of q-quantile when optq="yes". (Default 0.5) |
np |
integer giving the number of values of q-quantile in the interval (pqmin,pqmax) when optq="yes". (Default 20) |
Nbootstrap |
integer giving the number of samples used for bootstrapping to estimate the uncertainty of the attribution probability $p_s$ bootstrap. (Default 10000) |
verbose |
boolean (TRUE/FALSE) for the display of a progress bar (Default FALSE) |
Value
If optq="yes", the output is a list with seven elements:
Number of individuals from unknown origin.
Number of sources.
Statistics of the attribution probability to sources, $p_s$.
Probability of attribution of each unknown individual to each source $p_u,s$
Runtime of the calculation.
Number of loci.
Parameter q used to calculate the q-quantile of the Hamming distance in the MMD method.
Data frame giving the probability of correct attribution vs. q-quantile.
If optq="no", the output list contains all the items in the list above except the last one.
Author(s)
Francisco J. Perez-Reche (Univeristy of Aberdeen)
Examples
## This example uses a small dataset stored in the MMD package
datafile <- system.file("extdata", "Campylobacter_10SNP_HlW.csv", package = "MMD")
popfile <- system.file("extdata", "Campylobacter_10SNP_HlW.pop", package = "MMD")
NL <- 100
sourcenames <- c("Cattle","Chicken","Pig","Sheep","WB")
##----- Source attribution
ToAttribute <- "Human"
SelfA="no"
attribution <- MMD_attr(datafile,popfile,NL,sourcenames,ToAttribute)
## See more detailed examples in the vignette.