bed_clumping {bigsnpr} | R Documentation |
LD clumping
Description
For a bigSNP
:
-
snp_pruning()
: LD pruning. Similar to "--indep-pairwise (size+1) 1 thr.r2
" in PLINK. This function is deprecated (see this article). -
snp_clumping()
(andbed_clumping()
): LD clumping. If you do not provide any statistic to rank SNPs, it would use minor allele frequencies (MAFs), making clumping similar to pruning. -
snp_indLRLDR()
: Get SNP indices of long-range LD regions for the human genome.
Usage
bed_clumping(
obj.bed,
ind.row = rows_along(obj.bed),
S = NULL,
thr.r2 = 0.2,
size = 100/thr.r2,
exclude = NULL,
ncores = 1
)
snp_clumping(
G,
infos.chr,
ind.row = rows_along(G),
S = NULL,
thr.r2 = 0.2,
size = 100/thr.r2,
infos.pos = NULL,
is.size.in.bp = NULL,
exclude = NULL,
ncores = 1
)
snp_pruning(
G,
infos.chr,
ind.row = rows_along(G),
size = 49,
is.size.in.bp = FALSE,
infos.pos = NULL,
thr.r2 = 0.2,
exclude = NULL,
nploidy = 2,
ncores = 1
)
snp_indLRLDR(infos.chr, infos.pos, LD.regions = LD.wiki34)
Arguments
obj.bed |
Object of type bed, which is the mapping of some bed file.
Use |
ind.row |
An optional vector of the row indices (individuals) that
are used. If not specified, all rows are used. |
S |
A vector of column statistics which express the importance
of each SNP (the more important is the SNP, the greater should be
the corresponding statistic). |
thr.r2 |
Threshold over the squared correlation between two SNPs.
Default is |
size |
For one SNP, window size around this SNP to compute correlations.
Default is |
exclude |
Vector of SNP indices to exclude anyway. For example,
can be used to exclude long-range LD regions (see Price2008). Another use
can be for thresholding with respect to p-values associated with |
ncores |
Number of cores used. Default doesn't use parallelism. You may use nb_cores. |
G |
A FBM.code256
(typically |
infos.chr |
Vector of integers specifying each SNP's chromosome. |
infos.pos |
Vector of integers specifying the physical position
on a chromosome (in base pairs) of each SNP. |
is.size.in.bp |
Deprecated. |
nploidy |
Number of trials, parameter of the binomial distribution.
Default is |
LD.regions |
A |
Value
-
snp_clumping()
(andbed_clumping()
): SNP indices that are kept. -
snp_indLRLDR()
: SNP indices to be used as (part of) the 'exclude
' parameter ofsnp_clumping()
.
References
Price AL, Weale ME, Patterson N, et al. Long-Range LD Can Confound Genome Scans in Admixed Populations. Am J Hum Genet. 2008;83(1):132-135. doi:10.1016/j.ajhg.2008.06.005
Examples
test <- snp_attachExtdata()
G <- test$genotypes
# clumping (prioritizing higher MAF)
ind.keep <- snp_clumping(G, infos.chr = test$map$chromosome,
infos.pos = test$map$physical.pos,
thr.r2 = 0.1)
# keep most of them -> not much LD in this simulated dataset
length(ind.keep) / ncol(G)