R: Association Analysis Between Methylation Beta Values and...

cpg.assoc {CpGassoc}

R Documentation

Association Analysis Between Methylation Beta Values and Phenotype of Interest

Description

Association Analysis Between Methylation Beta Values and Phenotype of Interest.

Usage

cpg.assoc(beta.val, indep, covariates = NULL, data = NULL, logit.transform = FALSE, 
chip.id = NULL, subset = NULL, random = FALSE, fdr.cutoff = 0.05, large.data = FALSE,
fdr.method = "BH", logitperm = FALSE,return.data=FALSE)

Arguments

`beta.val`	A vector, matrix, or data frame containing the beta values of interest (1 row per CpG site, 1 column per individual).
`indep`	A vector containing the variable to be tested for association. `cpg.assoc` will evaluate the association between the beta values (dependent variable) and indep (independent variable).
`covariates`	A data frame consisting of additional covariates to be included in the model. covariates can also be specified as a matrix if it takes the form of a model matrix with no intercept column, or can be specified as a vector if there is only one covariate of interest. Can also be a formula(e.g. `~cov1+cov2`).
`data`	an optional data frame, list or environment (or object coercible by as.data.frame to a data frame) containing the variables in the model. If not found in data, the variables are taken from the environment from which cpg.assoc is called.
`logit.transform`	logical. If `TRUE`, the logit transform of the beta values log(beta.val/(1-beta.val)) will be used. Any values equal to zero or one will be set to the next smallest or next largest value respectively; values <0 or >1 will be set to NA.
`chip.id`	An optional vector containing chip, batch identities, or other categorical factor of interest to the researcher. If specified, chip id will be included as a factor in the model.
`subset`	An optional logical vector specifying a subset of observations to be used in the fitting process.
`random`	Logical. If `TRUE`, chip.id will be included in the model as a random effect, and a random intercept model will be fitted. If `FALSE`, chip.id will be included in the model as an ordinary categorical covariate, for a much faster analysis.
`fdr.cutoff`	The desired FDR threshold. The default setting is .05. The set of CpG sites with FDR < fdr.cutoff will be labeled as significant.
`large.data`	Logical. Enables analyses of large datasets. When `large.data=TRUE`, `cpg.assoc` avoids memory problems by performing the analysis in chunks. Note: this option no longer works within windows systems. Based on reading max memory allowable by the system. Defaults to False.
`fdr.method`	Character. Method used to calculate False Discovery Rate. Choices include any of the methods available in `p.adjust()`. The default method is "BH" for the Benjamini & Hochberg method.
`logitperm`	Logical. For internal use only.
`return.data`	Logical. cpg.assoc can return dataframes containing the the variable of interest, covariates, and the chip id (if present). Defaults to FALSE. Set to TRUE if plan on using the downstream scatterplot functions).

Details

cpg.assoc is designed to test for association between an independent variable and methylation at a number of CpG sites, with the option to include additional covariates and factors.
cpg.assoc assesses significance with the Holm (step-down Bonferroni) and FDR methods.

If class(indep)='factor', cpg.assoc will perform an ANOVA test of the variable conditional on the covariates specified. Covariates, if entered, should be in the form of a data frame, matrix, or vector. For example, covariates=data.frame(weight,age,factor(city)). The data frame can also be specified prior to calling cpg.assoc. The covariates should either be vectors or columns of a matrix or data.frame.
cpg.assoc is also designed to deal with large data sets. Setting large.data=TRUE will make cpg.assoc split up the data in chunks. Other option is to use cpg.combine and split up oneself.

Value

cpg.assoc will return an object of class "cpg". The functions summary and plot can be called to get a summary of results and to create QQ plots.

`results`	A data frame consisting of the t or F statistics and P-values for each CpG site, as well as indicators of Holm and FDR significance. CpG sites will be in the same order as the original input, but the `sort()` function can be used directly on the `cpg.assoc` object to sort CpG sites by p-value.
`Holm.sig`	A list of sites that met criteria for Holm significance.
`FDR.sig`	A data.frame of the CpG sites that were significant by the FDR method specified.
`info`	A data frame consisting of the minimum P-value observed, the FDR method that was used, the phenotype of interest, the number of covariates in the model, the name of the matrix or data frame the methylation beta values were taken from, the FDR cutoff value and whether a mixed effects analysis was performed.
`indep`	If `return.data=T`, the independent variable that was tested for association.
`covariates`	If `return.data=T`, data.frame or matrix of covariates, if specified (otherwise `NULL`).
`chip`	If `return.data=T`, chip.id vector, if specified (otherwise `NULL`).
`coefficients`	A data frame consisting of the degrees of freedom, and if object is continous the intercept effect adjusted for possible covariates in the model, the estimated effect size, and the standard error. The degrees of freedom is used in `plot.cpg` to compute the genomic inflation factors.

Author(s)

Barfield, R.; Conneely, K.; Kilaru,V.
Maintainer: R. Barfield: <barfieldrichard8@gmail.com>

Examples

 # Sample output from CpGassoc
data(samplecpg,samplepheno,package="CpGassoc")
results<-cpg.assoc(samplecpg,samplepheno$weight,large.data=FALSE)
results
#Analysis with covariates. There are multiple ways to do this. One can define the
#dataframe prior or do it in the function call.
test<-cpg.assoc(samplecpg,samplepheno$weight,data.frame(samplepheno$Distance,
samplepheno$Dose),large.data=FALSE)
# or
covar<-data.frame(samplepheno$Distance,samplepheno$Dose)
test2<-cpg.assoc(samplecpg,samplepheno$weight,covar,large.data=FALSE)


#Doing a mixed effects model. This does take more time, so we will do a subset of
#the samplecpg
randtest<-cpg.assoc(samplecpg[1:10,],samplepheno$weight,chip.id=samplepheno$chip,
random=TRUE,large.data=FALSE)

[Package CpGassoc version 2.70 Index]