atlas {ludic} | R Documentation |
Association testing by combining several matching thresholds
Description
Computes association test p-values from a generalized linear model for each considered threshold, and computes a p-value for the combination of all the envisioned thresholds through Fisher's method using perturbation resampling.
Usage
atlas(
match_prob,
y,
x,
covar = NULL,
thresholds = seq(from = 0.1, to = 0.9, by = 0.2),
nb_perturb = 200,
dist_family = c("gaussian", "binomial"),
impute_strategy = c("weighted average", "best")
)
Arguments
match_prob |
matching probabilities matrix (e.g. obtained through |
y |
response variable of length |
x |
a |
covar |
a |
thresholds |
a vector (possibly of length |
nb_perturb |
the number of perturbation used for the p-value combination. Default is 200. |
dist_family |
a character string indicating the distribution family for the glm.
Currently, only |
impute_strategy |
a character string indicating which strategy to use to impute x
from the matching probabilities |
Value
a list containing the following:
-
influencefn_pvals
p-values obtained from influence function perturbations with the covariates as columns and thethresholds
as rows, with an additional row at the top for the combination -
wald_pvals
a matrix containing the p-values obtained from the Wald test with the covariates as columns and thethresholds
as rows -
ptbed_pvals
a list containing, for each covariates, a matrix with thenb_perturb
perturbed p-values with the differentthresholds
as rows -
theta_impute
a matrix of the estimated coefficients from the glm when imputing the weighted average for covariates (as columns) with thethresholds
as rows -
sd_theta
a matrix of the estimated SD (from the influence function) of the coefficients from the glm when imputing the weighted average for covariates (as columns), with thethresholds
as rows -
ptbed_theta_impute
a list containing, for each covariates, a matrix with thenb_perturb
perturbed estimated coefficients from the glm when imputing the weighted average for covariates, with the differentthresholds
as rows -
impute_strategy
a character string indicating which impute strategy was used (either"weighted average"
or"best"
)
References
Zhang HG, Hejblum BP, Weber G, Palmer N, Churchill S, Szolovits P, Murphy S, Liao KP, Kohane I and Cai T, ATLAS: An automated association test using probabilistically linked health records with application to genetic studies, JAMIA, in press (2021). doi: 10.1101/2021.05.02.21256490.
Examples
#rm(list=ls())
n_sims <- 1#5000
mysim <- function(i){
x <- matrix(ncol=2, nrow=99, stats::rnorm(n=99*2))
#plot(density(rbeta(n=1000, 1,2)))
match_prob <- matrix(rbeta(n=103*99, 1, 2), nrow=103, ncol=99)
#y <- rnorm(n=103, mean = 1, sd = 0.5)
#return(atlas(match_prob, y, x, dist_family="gaussian")$influencefn_pvals)
y <- rbinom(n=103, size = 1, prob=0.5)
return(atlas(match_prob, y, x, dist_family="binomial")$influencefn_pvals)
}
#res <- pbapply::pblapply(1:n_sims, mysim, cl = parallel::detectCores()-1)
res <- lapply(1:n_sims, mysim)
size <- sapply(1:(ncol(res[[1]])-2),
FUN = function(i){
rowMeans(sapply(res, function(m){m[, i]<0.05}), na.rm = TRUE)
}
)
rownames(size) <- rownames(res[[1]])
colnames(size) <- colnames(res[[1]])[-(-1:0 + ncol(res[[1]]))]
size