gl.impute {dartR.base}R Documentation

Imputes missing data

Description

This function imputes genotypes on a population-by-population basis, where populations can be considered panmictic, or imputes the state for presence-absence data.

Usage

gl.impute(
  x,
  method = "neighbour",
  fill.residual = TRUE,
  parallel = FALSE,
  verbose = NULL
)

Arguments

x

Name of the genlight object containing the SNP or presence-absence data [required].

method

Imputation method, either "frequency" or "HW" or "neighbour" or "random" [default "neighbour"].

fill.residual

Should any residual missing values remaining after imputation be set to 0, 1, 2 at random, taking into account global allele frequencies at the particular locus [default TRUE].

parallel

A logical indicating whether multiple cores -if available- should be used for the computations (TRUE), or not (FALSE); requires the package parallel to be installed [default FALSE].

verbose

Verbosity: 0, silent or fatal errors; 1, begin and end; 2, progress log ; 3, progress and results summary; 5, full report [default 2 or as specified using gl.set.verbosity].

Details

We recommend that imputation be performed on sampling locations, before any aggregation. Imputation is achieved by replacing missing values using either of two methods:

The nearest neighbour is the one with the smallest Euclidean distance in all the dataset. The advantage of this approach is that it works regardless of how many individuals are in the population to which the focal individual belongs, and the displacement of the individual is haphazard as opposed to: (a) Drawing the individual toward the population centroid (HW and Frequency). (b) Drawing the individual toward the global centroid (glPCA). Note that loci that are missing for all individuals in a population are not imputed with method 'frequency' or 'HW'. Consider using the function gl.filter.allna with by.pop=TRUE to remove them first.

Value

A genlight object with the missing data imputed.

Author(s)

Custodian: Luis Mijangos (Post to https://groups.google.com/d/forum/dartr)

See Also

Other data manipulation: gl.define.pop(), gl.drop.ind(), gl.drop.loc(), gl.drop.pop(), gl.edit.recode.pop(), gl.join(), gl.keep.ind(), gl.keep.loc(), gl.keep.pop(), gl.make.recode.ind(), gl.merge.pop(), gl.reassign.pop(), gl.recode.ind(), gl.recode.pop(), gl.rename.pop(), gl.sample(), gl.sim.genotypes(), gl.sort(), gl.subsample.ind(), gl.subsample.loc()

Examples

 
require("dartR.data")
# SNP genotype data
gl <- gl.filter.callrate(platypus.gl,threshold=0.95)
gl <- gl.filter.allna(gl)
gl <- gl.impute(gl,method="neighbour")
# Sequence Tag presence-absence data
gs <- gl.filter.callrate(testset.gs,threshold=0.95)
gl <- gl.filter.allna(gl)
gs <- gl.impute(gs, method="neighbour")

gs <- gl.impute(platypus.gl,method ="random")


[Package dartR.base version 0.65 Index]