R: Calculate -log10(p) of each SNP-set by the score test...

score.calc.score.MC {RAINBOWR}

R Documentation

Calculate -log10(p) of each SNP-set by the score test (multi-cores)

Description

This function calculates -log10(p) of each SNP-set by the score test. First, the function calculates the score statistic without solving the multi-kernel mixed model for each SNP-set. Then it performs the score test by using the fact that the score statistic follows the chi-square distribution.

Usage

score.calc.score.MC(
  M.now,
  y,
  X.now,
  ZETA.now,
  LL0,
  Gu,
  Ge,
  P0,
  n.core = 2,
  parallel.method = "mclapply",
  map,
  kernel.method = "linear",
  kernel.h = "tuned",
  haplotype = TRUE,
  num.hap = NULL,
  test.effect = "additive",
  window.size.half = 5,
  window.slide = 1,
  chi0.mixture = 0.5,
  weighting.center = TRUE,
  weighting.other = NULL,
  gene.set = NULL,
  min.MAF = 0.02,
  count = TRUE
)

Arguments

`M.now`	A `n \times m` genotype matrix where `n` is sample size and `m` is the number of markers.
`y`	A `n \times 1` vector. A vector of phenotypic values should be used. NA is allowed.
`X.now`	A `n \times p` matrix. You should assign mean vector (rep(1, n)) and covariates. NA is not allowed.
`ZETA.now`	A list of variance (relationship) matrix (K; `m \times m`) and its design matrix (Z; `n \times m`) of random effects. You can use only one kernel matrix. For example, ZETA = list(A = list(Z = Z, K = K)) Please set names of list "Z" and "K"!
`LL0`	The log-likelihood for the null model.
`Gu`	A `n \times n` matrix. You should assign `ZKZ'`, where K is covariance (relationship) matrix and Z is its design matrix.
`Ge`	A `n \times n` matrix. You should assign identity matrix I (diag(n)).
`P0`	A `n \times n` matrix. The Moore-Penrose generalized inverse of `SV0S`, where `S = X(X'X)^{-1}X'` and `V0 = \sigma^2_u Gu + \sigma^2_e Ge`. `\sigma^2_u` and `\sigma^2_e` are estimators of the null model.
`n.core`	Setting n.core > 1 will enable parallel execution on a machine with multiple cores. This argument is not valid when 'parallel.method = "furrr"'.
`parallel.method`	Method for parallel computation. We offer three methods, "mclapply", "furrr", and "foreach". When 'parallel.method = "mclapply"', we utilize `pbmclapply` function in the 'pbmcapply' package with 'count = TRUE' and `mclapply` function in the 'parallel' package with 'count = FALSE'. When 'parallel.method = "furrr"', we utilize `future_map` function in the 'furrr' package. With 'count = TRUE', we also utilize `progressor` function in the 'progressr' package to show the progress bar, so please install the 'progressr' package from github (https://github.com/HenrikBengtsson/progressr). For 'parallel.method = "furrr"', you can perform multi-thread parallelization by sharing memories, which results in saving your memory, but quite slower compared to 'parallel.method = "mclapply"'. When 'parallel.method = "foreach"', we utilize `foreach` function in the 'foreach' package with the utilization of `makeCluster` function in 'parallel' package, and `registerDoParallel` function in 'doParallel' package. With 'count = TRUE', we also utilize `setTxtProgressBar` and `txtProgressBar` functions in the 'utils' package to show the progress bar. We recommend that you use the option 'parallel.method = "mclapply"', but for Windows users, this parallelization method is not supported. So, if you are Windows user, we recommend that you use the option 'parallel.method = "foreach"'.
`map`	Data frame of map information where the first column is the marker names, the second and third column is the chromosome amd map position, and the forth column is -log10(p) for each marker.
`kernel.method`	It determines how to calculate kernel. There are three methods. "gaussian" It is the default method. Gaussian kernel is calculated by distance matrix. "exponential" When this method is selected, exponential kernel is calculated by distance matrix. "linear" When this method is selected, linear kernel is calculated by NOIA methods for additive GRM.
`kernel.h`	The hyper parameter for gaussian or exponential kernel. If kernel.h = "tuned", this hyper parameter is calculated as the median of off-diagonals of distance matrix of genotype data.
`haplotype`	If the number of lines of your data is large (maybe > 100), you should set haplotype = TRUE. When haplotype = TRUE, haplotype-based kernel will be used for calculating -log10(p). (So the dimension of this gram matrix will be smaller.) The result won't be changed, but the time for the calculation will be shorter.
`num.hap`	When haplotype = TRUE, you can set the number of haplotypes which you expect. Then similar arrays are considered as the same haplotype, and then make kernel(K.SNP) whose dimension is num.hap x num.hap. When num.hap = NULL (default), num.hap will be set as the maximum number which reflects the difference between lines.
`test.effect`	Effect of each marker to test. You can choose "test.effect" from "additive", "dominance" and "additive+dominance". You also can choose more than one effect, for example, test.effect = c("additive", "aditive+dominance")
`window.size.half`	This argument decides how many SNPs (around the SNP you want to test) are used to calculated K.SNP. More precisely, the number of SNPs will be 2 * window.size.half + 1.
`window.slide`	This argument determines how often you test markers. If window.slide = 1, every marker will be tested. If you want to perform SNP set by bins, please set window.slide = 2 * window.size.half + 1.
`chi0.mixture`	RAINBOWR assumes the test statistic `l1' F l1` is considered to follow a x chisq(df = 0) + (1 - a) x chisq(df = r). where l1 is the first derivative of the log-likelihood and F is the Fisher information. And r is the degree of freedom. The argument chi0.mixture is a (0 <= a < 1), and default is 0.5.
`weighting.center`	In kernel-based GWAS, weights according to the Gaussian distribution (centered on the tested SNP) are taken into account when calculating the kernel if Rainbow = TRUE. If weighting.center = FALSE, weights are not taken into account.
`weighting.other`	You can set other weights in addition to weighting.center. The length of this argument should be equal to the number of SNPs. For example, you can assign SNP effects from the information of gene annotation.
`gene.set`	If you have information of gene, you can use it to perform kernel-based GWAS. You should assign your gene information to gene.set in the form of a "data.frame" (whose dimension is (the number of gene) x 2). In the first column, you should assign the gene name. And in the second column, you should assign the names of each marker, which correspond to the marker names of "geno" argument.
`min.MAF`	Specifies the minimum minor allele frequency (MAF). If a marker has a MAF less than min.MAF, it is assigned a zero score.
`count`	When count is TRUE, you can know how far RGWAS has ended with percent display.

Value

-log10(p) for each SNP-set

References

Listgarten, J. et al. (2013) A powerful and efficient set test for genetic markers that handles confounders. Bioinformatics. 29(12): 1526-1533.

Lippert, C. et al. (2014) Greater power and computational efficiency for kernel-based association testing of sets of genetic variants. Bioinformatics. 30(22): 3206-3214.

[Package RAINBOWR version 0.1.35 Index]