R: Association testing by combining several matching thresholds

atlas {ludic}

R Documentation

Association testing by combining several matching thresholds

Description

Computes association test p-values from a generalized linear model for each considered threshold, and computes a p-value for the combination of all the envisioned thresholds through Fisher's method using perturbation resampling.

Usage

atlas(
  match_prob,
  y,
  x,
  covar = NULL,
  thresholds = seq(from = 0.1, to = 0.9, by = 0.2),
  nb_perturb = 200,
  dist_family = c("gaussian", "binomial"),
  impute_strategy = c("weighted average", "best")
)

Arguments

`match_prob`	matching probabilities matrix (e.g. obtained through `recordLink`) of dimensions `n1 x n2`.
`y`	response variable of length `n1`. Only binary phenotypes are supported at the moment.
`x`	a `matrix` or a `data.frame` of predictors of dimensions `n2 x p`. An intercept is automatically added within the function.
`covar`	a `matrix` or a `data.frame` of variables to be adjusted on in the test of dimensions `n3 x p`. Default is `NULL` in which case there is no adjustment.
`thresholds`	a vector (possibly of length `1`) containing the different threshold to use to call a match. Default is `seq(from = 0.5, to = 0.95, by = 0.05)`.
`nb_perturb`	the number of perturbation used for the p-value combination. Default is 200.
`dist_family`	a character string indicating the distribution family for the glm. Currently, only `'gaussian'` and `'binomial'` are supported. Default is `'gaussian'`.
`impute_strategy`	a character string indicating which strategy to use to impute x from the matching probabilities `match_prob`. Either `"best"` (in which case the highest probable match above the threshold is imputed) or `"weighted average"` (in which case weighted mean is imputed for each individual who has at least one match with a posterior probability above the threshold). Default is `"weighted average"`.

Value

a list containing the following:

influencefn_pvals p-values obtained from influence function perturbations with the covariates as columns and the thresholds as rows, with an additional row at the top for the combination
wald_pvals a matrix containing the p-values obtained from the Wald test with the covariates as columns and the thresholds as rows
ptbed_pvals a list containing, for each covariates, a matrix with the nb_perturb perturbed p-values with the different thresholds as rows
theta_impute a matrix of the estimated coefficients from the glm when imputing the weighted average for covariates (as columns) with the thresholds as rows
sd_theta a matrix of the estimated SD (from the influence function) of the coefficients from the glm when imputing the weighted average for covariates (as columns), with the thresholds as rows
ptbed_theta_impute a list containing, for each covariates, a matrix with the nb_perturb perturbed estimated coefficients from the glm when imputing the weighted average for covariates, with the different thresholds as rows
impute_strategy a character string indicating which impute strategy was used (either "weighted average" or "best")

References

Zhang HG, Hejblum BP, Weber G, Palmer N, Churchill S, Szolovits P, Murphy S, Liao KP, Kohane I and Cai T, ATLAS: An automated association test using probabilistically linked health records with application to genetic studies, JAMIA, in press (2021). doi: 10.1101/2021.05.02.21256490.

Examples

#rm(list=ls())

n_sims <- 1#5000

mysim <- function(i){
 x <- matrix(ncol=2, nrow=99, stats::rnorm(n=99*2))
 #plot(density(rbeta(n=1000, 1,2)))
 match_prob <- matrix(rbeta(n=103*99, 1, 2), nrow=103, ncol=99)

 #y <- rnorm(n=103, mean = 1, sd = 0.5)
 #return(atlas(match_prob, y, x, dist_family="gaussian")$influencefn_pvals)
 y <- rbinom(n=103, size = 1, prob=0.5)
 return(atlas(match_prob, y, x, dist_family="binomial")$influencefn_pvals)
}
#res <- pbapply::pblapply(1:n_sims, mysim, cl = parallel::detectCores()-1)
res <- lapply(1:n_sims, mysim)

size <- sapply(1:(ncol(res[[1]])-2), 
              FUN = function(i){
           rowMeans(sapply(res, function(m){m[, i]<0.05}), na.rm = TRUE)
           }
)
rownames(size) <- rownames(res[[1]])
colnames(size) <- colnames(res[[1]])[-(-1:0 + ncol(res[[1]]))]
size

[Package ludic version 0.2.0 Index]