diem {diemr} | R Documentation |
Diagnostic Index Expectation Maximisation
Description
Estimates how to assign alleles in a genome to maximise the distinction between two
unknown groups of individuals. Using expectation maximisation (EM) in likelihood
framework, diem
provides marker
polarities for importing data, their likelihood-based diagnostic index and its support
for all markers, and hybrid indices for all individuals.
Usage
diem(
files,
ploidy = FALSE,
markerPolarity = FALSE,
ChosenInds,
ChosenSites = "all",
epsilon = 0.99999,
verbose = FALSE,
nCores = parallel::detectCores() - 1,
maxIterations = 50,
...
)
Arguments
files |
character vector with paths to files with genotypes. |
ploidy |
logical or list of length equal to length of |
markerPolarity |
|
ChosenInds |
numeric vector of indices of individuals to be included in the analysis. |
ChosenSites |
logical vector indicating which sites are to be included in the analysis. |
epsilon |
numeric, specifying how much the hypothetical diagnostic markers should
contribute to the likelihood calculations. Must be in |
verbose |
logical or character with path to directory where run diagnostics will be saved. |
nCores |
numeric. Number of cores to be used for parallelisation. Must be
at most equal to the number of files in the |
maxIterations |
numeric. |
... |
additional arguments. |
Details
Given two alleles of a marker, one allele can belong to one side of a barrier
to geneflow and the other to the other side. Which allele belongs where is a non-trivial
matter. A marker state in an individual can be encoded as 0 if the individual is
homozygous for the first allele, and 2 if the individual is homozygous for the second
allele. Marker polarity determines how the marker will be imported. Marker polarity
equal to FALSE
means that the marker will be imported as-is. A marker with
polarity equal to TRUE
will be imported with states 0 mapped as 2 and states 2
mapped as 0, in effect switching which allele belongs to which side of a barrier to
geneflow.
When markerPolarity = FALSE
, diem
uses random null polarities to
initiate the EM algorithm. To fix the null polarities, markerPolarity
must be
a list of length equal to the length of the files
argument, where each element
in the list is a logical vector of length equal to the number of markers (rows) in
the specific file.
Ploidy needs to be given for each compartment and for each individual. For example,
for a dataset of three diploid mammal males consisting of an autosomal
compartment, an X chromosome
compartment and a Y chromosome compartment, the ploidy list would be
ploidy = list(rep(2, 3), rep(1, 3), rep(1, 3)
. If the dataset consisted of
one male and two females,
ploidy for the sex chromosomes should be vectors reflecting that females have two X
chromosomes, but males only one, and females have no Y chromosomes:
ploidy = list(rep(2, 3), c(1, 2, 2), c(1, 0, 0))
.
When verbose = TRUE
, diem
will output multiple files with information
on the iterations of the EM algorithm, including tracking marker polarities and the
respective likelihood-based diagnostics. See vignette vignette("Understanding-genome-polarisation-output-files",
package = "diemr")
for a detailed explanation of the individual output files.
Value
A list including suggested marker polarities, diagnostic indices and support for all markers, four genomic state counts matrix for all individuals, and polarity changes for the EM iterations.
Note
To ensure that the data input format of the genotype files, ploidies and individual
selection are readable for diem
, first use CheckDiemFormat.
Fix all errors, and run diem
only once the checks all passed.
The working directory or a folder optionally specified in the verbose
argument must have write permissions. diem
will store temporary files in the
location and output results files.
See Also
Examples
# set up input genotypes file names, ploidies and selection of individual samples
inputFile <- system.file("extdata", "data7x3.txt", package = "diemr")
ploidies <- list(c(2, 1, 2, 2, 2, 1, 2))
inds <- 1:6
# check input data
CheckDiemFormat(files = inputFile, ploidy = ploidies, ChosenInds = inds)
# File check passed: TRUE
# Ploidy check passed: TRUE
# run diem
## Not run:
# diem will write temporal files during EM iterations
# prior to running diem, set the working directory to a location with write permission
fit <- diem(files = inputFile, ChosenInds = inds, ploidy = ploidies, nCores = 1)
# run diem with fixed null polarities
fit2 <- diem(
files = inputFile, ChosenInds = inds, ploidy = ploidies, nCores = 1,
markerPolarity = list(c(TRUE, FALSE, TRUE))
)
## End(Not run)