ps_paf {graphPAF}R Documentation

Estimate pathway specific population attributable fractions

Description

Estimate pathway specific population attributable fractions

Usage

ps_paf(
  response_model,
  mediator_models,
  riskfactor,
  refval,
  data,
  prev = NULL,
  ci = FALSE,
  boot_rep = 50,
  ci_level = 0.95,
  ci_type = c("norm"),
  weight_vec = NULL,
  verbose = TRUE
)

Arguments

response_model

A R model object for a binary outcome that involves a risk factor, confounders and mediators of the risk factor outcome relationship. Note that a weighted model should be used for case control data. Non-linear effects should be specified via ns(x, df=y), where ns is the natural spline function from the splines library.

mediator_models

A list of fitted models describing the risk factor/mediator relationship (the predictors in the model will be the risk factor and any confounders) Note a weighted model should be fit when data arise from a case control study. Models can be specified for linear responses (lm), binary responses (glm) and ordinal factors (through polr). Non-linear effects should be specified via ns(x, df=y), where ns is the natural spline function from the splines library.

riskfactor

character. Represents the name of the risk factor

refval

For factor valued risk factors, the reference level of the risk factor. If the risk factor is numeric, the reference level is assumed to be 0.

data

dataframe. A dataframe (with no missing values) containing the data used to fit the mediator and response models. You can run data_clean to the input dataset if the data has missing values as a pre-processing step

prev

numeric. A value between 0 and 1 specifying the prevalence of disease: only relevant to set if data arises from a case control study.

ci

logical. If TRUE a confidence interval is calculated using Bootstrap

boot_rep

Integer. Number of bootstrap replications (Only necessary to specify if ci=TRUE). Note that at least 50 replicates are recommended to achieve stable estimates of standard error. In the examples below, values of boot_rep less than 50 are sometimes used to limit run time.

ci_level

Numeric. Default 0.95. A number between 0 and 1 specifying the confidence level (only necessary to specify when ci=TRUE)

ci_type

Character. Default norm. A vector specifying the types of confidence interval desired. "norm", "basic", "perc" and "bca" are the available methods

weight_vec

An optional vector of inverse sampling weights for survey data (note that variance will not be calculated correctly if sampling isn't independent). Note that this will be ignored if prev is specified and calculation_method="D", in which case the weights will be constructed so the empirical re-weighted prevalence of disease is equal to prev

verbose

A logical indicator for whether extended output is produced when ci=TRUE, default TRUE

Value

A numeric vector (if ci=FALSE), or object (or class pspaf) (if CI=TRUE) with estimated PS-PAF for each mediator referred to in mediator_models, together with estimated direct PS-PAF and possibly confidence intervals.

References

Pathway specific Population attributable fractions. O’Connell, M.M. and Ferguson, J.P., 2022. IEA. International Journal of Epidemiology, 1, p.13. Accessible at: https://academic.oup.com/ije/advance-article/doi/10.1093/ije/dyac079/6583255?login=true

Examples

library(splines)
library(survival)
library(parallel)
options(boot.parallel="snow")
# User could set the next option to number of cores on machine:
options(boot.ncpus=2)
# Direct and pathway specific attributable fractions estimated
# on simulated case control stroke data:
# Note that the models here are weighted regressions (based on a column in the
# dataframe named 'weights') to rebalance the case control structure to make it
# representative over the population, according to the prev argument.
# Unweighted regression is fine to use if the data arises from cohort or
# cross sectional studies, in which case prev should be set to NULL
response_model <- glm(case ~ region * ns(age, df = 5) + sex * ns(age, df = 5) +
 education + exercise + ns(diet, df = 3) + smoking + alcohol + stress +
  ns(lipids, df = 3) + ns(waist_hip_ratio, df = 3) + high_blood_pressure,
  data=stroke_reduced,family='binomial', weights=weights)
mediator_models <- list(glm(high_blood_pressure ~ region * ns(age, df = 5) +
 sex * ns(age, df = 5) + education   +exercise + ns(diet, df = 3) + smoking +
 alcohol + stress,data=stroke_reduced,family='binomial',weights=weights),
 lm(lipids ~ region * ns(age, df = 5) + sex * ns(age, df = 5) +education +
  exercise + ns(diet, df = 3) + smoking + alcohol + stress, weights=weights,
   data=stroke_reduced),lm(waist_hip_ratio ~ region * ns(age, df = 5) +
   sex * ns(age, df = 5) + education + exercise + ns(diet, df = 3) +
    smoking + alcohol + stress, weights=weights, data=stroke_reduced))
# point estimate
ps_paf(response_model=response_model, mediator_models=mediator_models ,
riskfactor="exercise",refval=0,data=stroke_reduced,prev=0.0035, ci=FALSE)
# confidence intervals

ps_paf(response_model=response_model, mediator_models=mediator_models ,
riskfactor="exercise",refval=0,data=stroke_reduced,prev=0.0035, ci=TRUE,
boot_rep=100,ci_type="norm")


[Package graphPAF version 2.0.0 Index]