MMD_attr {MMD}R Documentation

Attribution of individuals to sources using the MMD method

Description

Attribution of individuals to sources using the MMD method

Usage

MMD_attr(
  datafile,
  popfile,
  NL,
  sourcenames,
  ToAttribute,
  SelfA = "no",
  fSelfA = 0.5,
  randomSelfA = "yes",
  quantile = 0.01,
  optq = "no",
  pqmin = 0,
  pqmax = 0.5,
  np = 20,
  Nbootstrap = 10000,
  verbose = FALSE
)

Arguments

datafile

character; Name of the file *.csv (with full path in the file system) containing the genotypes (features) of individuals.

popfile

character; Name of the file *.pop (with full path in the file system) containing the genotypes (features) of individuals.

NL

integer; number of loci. If larger than the number of available loci in the data set, NL is reduced to the maximum available number of loci.

sourcenames

a character vector listing the names of the sources.

ToAttribute

character giving the name of the individuals of aknown origin (i.e. those that will be attributed to source).

SelfA

character; if "no" attribution of individuals to sources is made; if "yes", self-attribution of selected individuals from sources is made. (Default "no")

fSelfA

real number in the interval (0,1). When SelfA="yes", fSelfA specifies the fraction of individuals from the source specified by ToAttribute that will be assumed to be of unknown origin. (Default 0.1)

randomSelfA

character only relevant if SelfA="yes". If "yes", individuals to be considered as unknown are randomly selected from the source specified by ToAttribute; if "no" a list of names for individuals is read from filepoplist. (Default "yes")

quantile

real number with values in (0,1) giving the q-quantile for the MMD method. Only used if the quantile is not obatined through optimisation of the probability of correct self-attribution. (Default 0.01)

optq

character; if "no", the specified quantile value is used; if "yes", the q-quantile is optimised (only meaningful for self-attribution so optq="no" automatically if SelfA="no"). (Default "no")

pqmin

real number with values in (0,1); minimum value of q-quantile when optq="yes". (Default 0)

pqmax

real number with values in (0,1); maximum value of q-quantile when optq="yes". (Default 0.5)

np

integer giving the number of values of q-quantile in the interval (pqmin,pqmax) when optq="yes". (Default 20)

Nbootstrap

integer giving the number of samples used for bootstrapping to estimate the uncertainty of the attribution probability $p_s$ bootstrap. (Default 10000)

verbose

boolean (TRUE/FALSE) for the display of a progress bar (Default FALSE)

Value

If optq="yes", the output is a list with seven elements:

  1. Number of individuals from unknown origin.

  2. Number of sources.

  3. Statistics of the attribution probability to sources, $p_s$.

  4. Probability of attribution of each unknown individual to each source $p_u,s$

  5. Runtime of the calculation.

  6. Number of loci.

  7. Parameter q used to calculate the q-quantile of the Hamming distance in the MMD method.

  8. Data frame giving the probability of correct attribution vs. q-quantile.

If optq="no", the output list contains all the items in the list above except the last one.

Author(s)

Francisco J. Perez-Reche (Univeristy of Aberdeen)

Examples

## This example uses a small dataset stored in the MMD package
datafile <- system.file("extdata", "Campylobacter_10SNP_HlW.csv", package = "MMD")
popfile <- system.file("extdata", "Campylobacter_10SNP_HlW.pop", package = "MMD")

NL <- 100
sourcenames <- c("Cattle","Chicken","Pig","Sheep","WB")

##----- Source attribution
ToAttribute <- "Human"
SelfA="no"
attribution <- MMD_attr(datafile,popfile,NL,sourcenames,ToAttribute)

## See more detailed examples in the vignette.


[Package MMD version 1.0.0 Index]