| atlas {ludic} | R Documentation | 
Association testing by combining several matching thresholds
Description
Computes association test p-values from a generalized linear model for each considered threshold, and computes a p-value for the combination of all the envisioned thresholds through Fisher's method using perturbation resampling.
Usage
atlas(
  match_prob,
  y,
  x,
  covar = NULL,
  thresholds = seq(from = 0.1, to = 0.9, by = 0.2),
  nb_perturb = 200,
  dist_family = c("gaussian", "binomial"),
  impute_strategy = c("weighted average", "best")
)
Arguments
| match_prob | matching probabilities matrix (e.g. obtained through  | 
| y | response variable of length  | 
| x | a  | 
| covar | a  | 
| thresholds | a vector (possibly of length  | 
| nb_perturb | the number of perturbation used for the p-value combination. Default is 200. | 
| dist_family | a character string indicating the distribution family for the glm. 
Currently, only  | 
| impute_strategy | a character string indicating which strategy to use to impute x 
from the matching probabilities  | 
Value
a list containing the following:
-  influencefn_pvalsp-values obtained from influence function perturbations with the covariates as columns and thethresholdsas rows, with an additional row at the top for the combination
-  wald_pvalsa matrix containing the p-values obtained from the Wald test with the covariates as columns and thethresholdsas rows
-  ptbed_pvalsa list containing, for each covariates, a matrix with thenb_perturbperturbed p-values with the differentthresholdsas rows
-  theta_imputea matrix of the estimated coefficients from the glm when imputing the weighted average for covariates (as columns) with thethresholdsas rows
-  sd_thetaa matrix of the estimated SD (from the influence function) of the coefficients from the glm when imputing the weighted average for covariates (as columns), with thethresholdsas rows
-  ptbed_theta_imputea list containing, for each covariates, a matrix with thenb_perturbperturbed estimated coefficients from the glm when imputing the weighted average for covariates, with the differentthresholdsas rows
-  impute_strategya character string indicating which impute strategy was used (either"weighted average"or"best")
References
Zhang HG, Hejblum BP, Weber G, Palmer N, Churchill S, Szolovits P, Murphy S, Liao KP, Kohane I and Cai T, ATLAS: An automated association test using probabilistically linked health records with application to genetic studies, JAMIA, in press (2021). doi: 10.1101/2021.05.02.21256490.
Examples
#rm(list=ls())
n_sims <- 1#5000
mysim <- function(i){
 x <- matrix(ncol=2, nrow=99, stats::rnorm(n=99*2))
 #plot(density(rbeta(n=1000, 1,2)))
 match_prob <- matrix(rbeta(n=103*99, 1, 2), nrow=103, ncol=99)
 #y <- rnorm(n=103, mean = 1, sd = 0.5)
 #return(atlas(match_prob, y, x, dist_family="gaussian")$influencefn_pvals)
 y <- rbinom(n=103, size = 1, prob=0.5)
 return(atlas(match_prob, y, x, dist_family="binomial")$influencefn_pvals)
}
#res <- pbapply::pblapply(1:n_sims, mysim, cl = parallel::detectCores()-1)
res <- lapply(1:n_sims, mysim)
size <- sapply(1:(ncol(res[[1]])-2), 
              FUN = function(i){
           rowMeans(sapply(res, function(m){m[, i]<0.05}), na.rm = TRUE)
           }
)
rownames(size) <- rownames(res[[1]])
colnames(size) <- colnames(res[[1]])[-(-1:0 + ncol(res[[1]]))]
size