cjamp {CJAMP}R Documentation

C-JAMP: Copula-based joint analysis of multiple phenotypes.

Description

Functions to perform C-JAMP: cjamp fits a joint model of two phenotypes conditional on one or multiple predictors; cjamp_loop uses cjamp to fit the same copula model separately for a list of multiple predictors, e.g. for a genetic association study.

Usage

cjamp(copula = "Clayton", Y1 = NULL, Y2 = NULL,
  predictors_Y1 = NULL, predictors_Y2 = NULL, scale_var = FALSE,
  optim_method = "BFGS", trace = 0, kkt2tol = 1e-16, SE_est = TRUE,
  pval_est = TRUE, n_iter_max = 10)

cjamp_loop(copula = "Clayton", Y1 = NULL, Y2 = NULL,
  predictors = NULL, covariates_Y1 = NULL, covariates_Y2 = NULL,
  scale_var = FALSE, optim_method = "BFGS", trace = 0,
  kkt2tol = 1e-16, SE_est = TRUE, pval_est = TRUE, n_iter_max = 10)

Arguments

copula

String indicating whether the joint model will be computed under the Clayton ("Clayton") or 2-parameter copula ("2param") model.

Y1

Numeric vector containing the first phenotype.

Y2

Numeric vector containing the second phenotype.

predictors_Y1

Dataframe containing the predictors of Y1 in columns (for the cjamp function).

predictors_Y2

Dataframe containing the predictors of Y2 in columns (for the cjamp function).

scale_var

Logical. Indicating whether all predictors will be centered and scaled before the analysis or not (default: FALSE).

optim_method

String passed to the optimx function. It specifies the optimization method in the optimx function that is used for maximizing the log-likelihood function. Recommended is the "BFGS" optimization method (default). For further methods, see the description of the optimx function.

trace

Integer passed to the optimx function. It specifies the tracing information on the progress of the optimization. The value 2 gives full tracing, default value 0 blocks all details. See also the optimx documentation.

kkt2tol

Numeric. Passed to the optimx function, default value is 1E-16. It specifies the tolerance for the eigenvalue ratio in the Karush-Kuhn-Tucker (KKT) test for a positive definite Hessian matrix. See also the optimx documentation.

SE_est

Logical indicator whether standard error estimates are computed for the parameters using the inverse of the observed information matrix (TRUE, default), or whether standard error estimates are not computed (FALSE).

pval_est

Logical indicator whether p-values are computed from hypothesis tests of the absence of effects of each predictor on each phenotype in the marginal models (TRUE, default), or whether p-values are not computed (FALSE). P-values are obtained from large sample Wald-type tests based on the maximum likelihood parameter estimates and the standard error estimates.

n_iter_max

Integer indicating the maximum number of optimization attempts of the log-likelihood function with different starting values, if the optimization doesn't converge (default: 10).

predictors

Dataframe containing the predictors of Y1 and Y2 in columns for which estimates are returned (for the cjamp_loop function).

covariates_Y1

Dataframe containing the covariates of Y1 in columns for which estimates are not returned (for the cjamp_loop function).

covariates_Y2

Dataframe containing the covariates of Y2 in columns for which estimates are not returned (for the cjamp_loop function).

Details

Both functions cjamp and cjamp_loop fit a joint copula model of two phenotypes using the Clayton copula or 2-parameter copula, conditional on none or multiple predictors and covariates in the marginal models. The optimx function of the optimx package is used to maximize the log-likelihood (i.e. to minimize the minus log-likelihood function minusloglik) to obtain maximum likelihood coefficient estimates of all parameters. For this, the BFGS optimization method is recommended. Standard error estimates of the coefficient estimates can be obtained by using the observed inverse information matrix. P-values from hypothesis tests of the absence of effects of each predictor on each phenotype in the marginal models can be obtained from large-sample Wald-type tests (see the vignette for details).

It should be noted that cjamp, cjamp_loop and minusloglik) assume quantitative predictors and use an additive model, i.e. for categorical predictors with more than 2 levels, dummy variables have to be created beforehand. Accordingly, if single nucleotide variants (SNVs) are included as predictors, the computation is based on an additive genetic model if SNVs are provided as 0-1-2 genotypes and on a dominant model if SNVs are provided as 0-1 genotypes.

The cjamp function returns point estimates of all parameters, standard error estimates and p-values for all marginal parameters (i.e. all parameters for predictors_Y1, predictors_Y1), the minus log-likelihood value as well as information about the convergence. The cjamp_loop function only returns point estimates, standard error estimates, and p-values for the specified predictors predictors and not the covariates covariates_Y1 and covariates_Y2, in addition to the minus log-likelihood value as well as convergence information.

It is recommended that all variables are centered and scaled before the analysis, which can be done through the scale_var parameter. Otherwise, if the scales of the variables differ, it can lead to convergence problems of the optimization.

Value

An object of class cjamp, for which the summary function summary.cjamp is implemented. The output is a list containing estimates of the copula parameters, marginal parameters, Kendall's tau (as well as the upper and lower tail dependence \lambda_l, \lambda_u if the 2-parameter copula model is fitted), the standard error estimates of all parameters, p-values of hypothesis tests of the marginal parameters (i.e. of the absence of predictor effects on the phenotypes), the convergence code of the log-likelihood maximization (from the optimx function, where 0 indicates successful convergence), the KKT conditions 1 and 2 of the convergence, and the maximum log-likelihood value.

Examples


# Data generation
set.seed(10)
genodata <- generate_genodata(n_SNV = 20, n_ind = 100)
phenodata <- generate_phenodata_2_copula(genodata = genodata$SNV1,
                                         MAF_cutoff = 1, prop_causal = 1,
                                         tau = 0.2, b1 = 0.3, b2 = 0.3)
predictors <- data.frame(X1 = phenodata$X1, X2 = phenodata$X2,
                         genodata[, 1:3])

## Not run. When executing, the following takes about 2 minutes running time.
## Example 1: Analysis of multiple SNVs as predictors in one model
#cjamp(copula = "Clayton", Y1 = phenodata$Y1, Y2 = phenodata$Y2,
#      predictors_Y1 = predictors, predictors_Y2 = predictors,
#      optim_method = "BFGS", trace = 0, kkt2tol = 1E-16, SE_est = TRUE,
#      pval_est = TRUE, n_iter_max = 10)
#cjamp(copula = "2param", Y1 = phenodata$Y1, Y2 = phenodata$Y2,
#      predictors_Y1 = predictors, predictors_Y2 = predictors,
#      optim_method = "BFGS", trace = 0, kkt2tol = 1E-16, SE_est = TRUE,
#      pval_est = TRUE, n_iter_max = 10)
#
## Example 2: Analysis of multiple SNVs in separate models
#covariates <- data.frame(X1 = phenodata$X1, X2 = phenodata$X2)
#predictors <- genodata
#cjamp_loop(copula = "Clayton", Y1 = phenodata$Y1, Y2 = phenodata$Y2,
#           predictors = predictors, covariates_Y1 = covariates,
#           covariates_Y2 = covariates, optim_method = "BFGS", trace = 0,
#           kkt2tol = 1E-16, SE_est = TRUE, pval_est = TRUE,
#           n_iter_max = 10)


[Package CJAMP version 0.1.1 Index]