analysisPheWAS {EHR} | R Documentation |
Statistical Analysis for PheWAS
Description
Implement three commonly used statistical methods to analyze data for Phenome Wide Association Study (PheWAS)
Usage
analysisPheWAS(
method = c("firth", "glm", "lr"),
adjust = c("PS", "demo", "PS.demo", "none"),
Exposure,
PS,
demographics,
phenotypes,
data
)
Arguments
method |
define the statistical analysis method from 'firth', 'glm', and 'lr'. 'firth': Firth's penalized-likelihood logistic regression; 'glm': logistic regression with Wald test, 'lr': logistic regression with likelihood ratio test. |
adjust |
define the adjustment method from 'PS','demo','PS.demo', and 'none'. 'PS': adjustment of PS only; 'demo': adjustment of demographics only; 'PS.demo': adjustment of PS and demographics; 'none': no adjustment. |
Exposure |
define the variable name of exposure variable. |
PS |
define the variable name of propensity score. |
demographics |
define the list of demographic variables. |
phenotypes |
define the list of phenotypes that need to be analyzed. |
data |
define the data. |
Details
Implements three commonly used statistical methods to analyze the associations between exposure (e.g., drug exposure, genotypes) and various phenotypes in PheWAS. Firth's penalized-likelihood logistic regression is the default method to avoid the problem of separation in logistic regression, which is often a problem when analyzing sparse binary outcomes and exposure. Logistic regression with likelihood ratio test and conventional logistic regression with Wald test can be also performed.
Value
estimate |
the estimate of log odds ratio. |
stdError |
the standard error. |
statistic |
the test statistic. |
pvalue |
the p-value. |
Author(s)
Leena Choi leena.choi@vanderbilt.edu and Cole Beck cole.beck@vumc.org
Examples
## use small datasets to run this example
data(dataPheWASsmall)
## make dd.base with subset of covariates from baseline data (dd.baseline.small)
## or select covariates with upper code as shown below
upper.code.list <- unique(sub("[.][^.]*(.).*", "", colnames(dd.baseline.small)) )
upper.code.list <- intersect(upper.code.list, colnames(dd.baseline.small))
dd.base <- dd.baseline.small[, upper.code.list]
## perform regularized logistic regression to obtain propensity score (PS)
## to adjust for potential confounders at baseline
phenos <- setdiff(colnames(dd.base), c('id', 'exposure'))
data.x <- as.matrix(dd.base[, phenos])
glmnet.fit <- glmnet::cv.glmnet(x=data.x, y=dd.base[,'exposure'],
family="binomial", standardize=TRUE,
alpha=0.1)
dd.base$PS <- c(predict(glmnet.fit, data.x, s='lambda.min'))
data.ps <- dd.base[,c('id', 'PS')]
dd.all.ps <- merge(data.ps, dd.small, by='id')
demographics <- c('age', 'race', 'gender')
phenotypeList <- setdiff(colnames(dd.small), c('id','exposure','age','race','gender'))
## run with a subset of phenotypeList to get quicker results
phenotypeList.sub <- sample(phenotypeList, 5)
results.sub <- analysisPheWAS(method='firth', adjust='PS', Exposure='exposure',
PS='PS', demographics=demographics,
phenotypes=phenotypeList.sub, data=dd.all.ps)
## run with the full list of phenotype outcomes (i.e., phenotypeList)
results <- analysisPheWAS(method='firth', adjust='PS',Exposure='exposure',
PS='PS', demographics=demographics,
phenotypes=phenotypeList, data=dd.all.ps)