diem {diemr}R Documentation

Diagnostic Index Expectation Maximisation

Description

Estimates how to assign alleles in a genome to maximise the distinction between two unknown groups of individuals. Using expectation maximisation (EM) in likelihood framework, diem provides marker polarities for importing data, their likelihood-based diagnostic index and its support for all markers, and hybrid indices for all individuals.

Usage

diem(
  files,
  ploidy = FALSE,
  markerPolarity = FALSE,
  ChosenInds,
  ChosenSites = "all",
  epsilon = 0.99999,
  verbose = FALSE,
  nCores = parallel::detectCores() - 1,
  maxIterations = 50,
  ...
)

Arguments

files

character vector with paths to files with genotypes.

ploidy

logical or list of length equal to length of files. Each element of the list contains a numeric vector with ploidy numbers for all individuals specified in the files.

markerPolarity

FALSE or list of logical vectors.

ChosenInds

numeric vector of indices of individuals to be included in the analysis.

ChosenSites

logical vector indicating which sites are to be included in the analysis.

epsilon

numeric, specifying how much the hypothetical diagnostic markers should contribute to the likelihood calculations. Must be in [0,1), keeping tolerance setting of the R session in mind.

verbose

logical or character with path to directory where run diagnostics will be saved.

nCores

numeric. Number of cores to be used for parallelisation. Must be at most equal to the number of files in the files argument, and nCores = 1 on Windows.

maxIterations

numeric.

...

additional arguments.

Details

Given two alleles of a marker, one allele can belong to one side of a barrier to geneflow and the other to the other side. Which allele belongs where is a non-trivial matter. A marker state in an individual can be encoded as 0 if the individual is homozygous for the first allele, and 2 if the individual is homozygous for the second allele. Marker polarity determines how the marker will be imported. Marker polarity equal to FALSE means that the marker will be imported as-is. A marker with polarity equal to TRUE will be imported with states 0 mapped as 2 and states 2 mapped as 0, in effect switching which allele belongs to which side of a barrier to geneflow.

When markerPolarity = FALSE, diem uses random null polarities to initiate the EM algorithm. To fix the null polarities, markerPolarity must be a list of length equal to the length of the files argument, where each element in the list is a logical vector of length equal to the number of markers (rows) in the specific file.

Ploidy needs to be given for each compartment and for each individual. For example, for a dataset of three diploid mammal males consisting of an autosomal compartment, an X chromosome compartment and a Y chromosome compartment, the ploidy list would be ploidy = list(rep(2, 3), rep(1, 3), rep(1, 3). If the dataset consisted of one male and two females, ploidy for the sex chromosomes should be vectors reflecting that females have two X chromosomes, but males only one, and females have no Y chromosomes: ploidy = list(rep(2, 3), c(1, 2, 2), c(1, 0, 0)).

When verbose = TRUE, diem will output multiple files with information on the iterations of the EM algorithm, including tracking marker polarities and the respective likelihood-based diagnostics. See vignette vignette("Understanding-genome-polarisation-output-files", package = "diemr") for a detailed explanation of the individual output files.

Value

A list including suggested marker polarities, diagnostic indices and support for all markers, four genomic state counts matrix for all individuals, and polarity changes for the EM iterations.

Note

To ensure that the data input format of the genotype files, ploidies and individual selection are readable for diem, first use CheckDiemFormat. Fix all errors, and run diem only once the checks all passed.

The working directory or a folder optionally specified in the verbose argument must have write permissions. diem will store temporary files in the location and output results files.

See Also

CheckDiemFormat

Examples

# set up input genotypes file names, ploidies and selection of individual samples
inputFile <- system.file("extdata", "data7x3.txt", package = "diemr")
ploidies <- list(c(2, 1, 2, 2, 2, 1, 2))
inds <- 1:6

# check input data
CheckDiemFormat(files = inputFile, ploidy = ploidies, ChosenInds = inds)
#  File check passed: TRUE
#  Ploidy check passed: TRUE

# run diem
## Not run: 
# diem will write temporal files during EM iterations
# prior to running diem, set the working directory to a location with write permission
fit <- diem(files = inputFile, ChosenInds = inds, ploidy = ploidies, nCores = 1)

# run diem with fixed null polarities
fit2 <- diem(
  files = inputFile, ChosenInds = inds, ploidy = ploidies, nCores = 1,
  markerPolarity = list(c(TRUE, FALSE, TRUE))
)

## End(Not run)

[Package diemr version 1.4 Index]