create.MK {KnockoffScreen}R Documentation

Sequential knockoff generator for genetic data

Description

Generate single/multiple knockoffs for genetic variants for customized analysis.

Usage

create.MK(X,pos,M=5,corr_max=0.75,maxN.neighbor=Inf,maxBP.neighbor=100000,
n.AL=floor(10*nrow(X)^(1/3)*log(nrow(X))),thres.ultrarare=25,
R2.thres=1,method='shrinkage',bigmemory=T)

Arguments

X

A n*p genotype matrix, where n is the sample size and p is the number of genetic variants.

pos

A vector of length p. Location of the p genetic variants.

M

Number of knockoffs per variant. The default is 5.

corr_max

The correlation threshold for hierarchical clustering, such that variants from two different clusters do not have a correlation greater than corr_max. The hierarchical clustering step is a practical strategy to improve the power for tightly linked variants. The default is 0.75.

maxN.neighbor

The maximum number of neighoring variables used to generate knockoffs. The default is Inf. Smaller number will inprove the computational efficiency, but the knockoffs will be less accurate.

maxBP.neighbor

The size of neighboring region (in base pairs) used to generate knockoffs. The default is 100000.

n.AL

The sample size for the algorithmic leveraging. The default is 10*n^(1/3)*log(n)).

thres.ultrarare

The minor allele count threshold that defines ultrarare variants. The knockoff generation for variants with minor allele counts below the threshold will be based on permutaton. The default is 25.

R2.thres

The maximum R2 allowed in the auto-regressive model. More liberal values (<1) lead to higher power for tightly linked variants, but the knockoffs will be less accurate. The default is 1.

method

The method for subsampling. The default is "shrinkage", corresponding to "shrinkage algorithmic leveraging".

bigmemory

Whether "bigmemory" operation is applied. Default is TRUE.

Value

X_k

An M dimentions list, where each dimention is an n*p matrix as a knockoff copy of original data.

Examples


library(KnockoffScreen)

# load example vcf file from package "seqminer"
vcf.filename = system.file("vcf/1000g.phase1.20110521.CFH.var.anno.vcf.gz", package = "seqminer")

## this is how the actual genotype matrix from package "seqminer" looks like
example.G <- t(readVCFToMatrixByRange(vcf.filename, "1:196621007-196716634",annoType='')[[1]])

# filter out constant variants
s<-apply(example.G,2,sd)
example.G<-example.G[,s!=0]
pos<-as.numeric(gsub("^.*:","",colnames(example.G)))

# generate multiple knockoffs
example.G_k<-create.MK(example.G,pos,M=5,corr_max=0.75)


[Package KnockoffScreen version 0.3.0 Index]