snp.pruning {ASRgenomics} | R Documentation |
Reduces the number of redundant markers on a molecular matrix M by pruning
Description
For a given molecular dataset \boldsymbol{M}
(in the format 0, 1 and 2)
it produces a reduced molecular matrix by eliminating "redundant"
markers using pruning techniques. This function finds and drops some of the
SNPs in high linkage disequilibrium (LD).
Usage
snp.pruning(
M = NULL,
map = NULL,
marker = NULL,
chrom = NULL,
pos = NULL,
method = c("correlation"),
criteria = c("callrate", "maf"),
pruning.thr = 0.95,
by.chrom = FALSE,
window.n = 50,
overlap.n = 5,
iterations = 10,
seed = NULL,
message = TRUE
)
Arguments
M |
A matrix with marker data of full form ( |
map |
(Optional) A data frame with the map information with |
marker |
A character indicating the name of the column in data frame |
chrom |
A character indicating the name of the column in data frame |
pos |
A character indicating the name of the column in data frame |
method |
A character indicating the method (or algorithm) to be used as reference for
identifying redundant markers.
The only method currently available is based on correlations (default = |
criteria |
A character indicating the criteria to choose which marker to drop
from a detected redundant pair.
Options are: |
pruning.thr |
A threshold value to identify redundant markers with Pearson's correlation larger than the
value provided (default = |
by.chrom |
If TRUE the pruning is performed independently by chromosome (default = |
window.n |
A numeric value with number of markers to consider in each
window to perform pruning (default = |
overlap.n |
A numeric value with number of markers to overlap between consecutive windows
(default = |
iterations |
An integer indicating the number of sequential times the pruning procedure
should be executed on remaining markers.
If no markers are dropped in a given iteration/run, the algorithm will stop (default = |
seed |
An integer to be used as seed for reproducibility. In case the criteria has the
same values for a given pair of markers, one will be dropped at random (default = |
message |
If |
Details
Pruning is recommended as redundancies can affect the quality of matrices used for downstream analyses. The algorithm used is based on the Pearson's correlation between markers as a proxy for LD. In the event of a pairwise correlation higher than the selected threshold markers will be eliminated as specified by: call rate, minor allele frequency. In case of tie, one marker will be dropped at random.
Filtering markers (qc.filtering) is of high relevance before pruning. Poor quality markers (e.g., monomorphic markers) may prevent correlations from being calculated and may affect eliminations.
Value
Mpruned
: a matrix containing the pruned marker M matrix.map
: an data frame containing the pruned map.
Examples
# Read and filter genotypic data.
M.clean <- qc.filtering(
M = geno.pine655,
maf = 0.05,
marker.callrate = 0.20, ind.callrate = 0.20,
Fis = 1, heterozygosity = 0.98,
na.string = "-9",
plots = FALSE)$M.clean
# Prune correlations > 0.9.
Mpr <- snp.pruning(
M = M.clean, pruning.thr = 0.90,
by.chrom = FALSE, window.n = 40, overlap.n = 10)
head(Mpr$map)
Mpr$Mpruned[1:5, 1:5]