R: Quantile Decomposition: parametric version

cfd.quantile {cfdecomp}

R Documentation

Quantile Decomposition: parametric version

Description

Decompose the difference in a quantile of some outcome Y between groups.

Usage

cfd.quantile(
  formula.y,
  formula.m,
  mediator,
  group,
  data,
  family.y = "binomial",
  family.m = "binomial",
  bs.size = 1000,
  mc.size = 50,
  alpha = 0.05,
  probs = 0.5,
  cluster.sample = FALSE,
  cluster.name = NA,
  sample.resid.y = FALSE,
  sample.resid.m = FALSE,
  print.iteration = FALSE
)

Arguments

`formula.y`	the `formula` for the multivariable model (see `glm`) for the outcome Y.
`formula.m`	the `formula` for the multivariable model (see `glm`) for the mediator M.
`mediator`	the column name of the mediator M.
`group`	column name of the variable containing the group identifier.
`data`	a data frame containing the variables in the model.
`family.y`	a description of the error distribution to be used in the model, see `family` for details. For the outcome variable any member of the `glm` family can be used.
`family.m`	a description of the error distribution to be used in the model, see `family` for details. For the mediator, currently `gaussian`, `binomial` and `poisson` are supported.
`bs.size`	the number of bootstrap iterations to be performed.
`mc.size`	the number of Monte Carlo iterations to be performed (more = more MC error reduction).
`alpha`	the alpha level used to construct confidence intervals (0.05 = 95 percent confidence interval).
`probs`	the quantiles of interest to be decomposed, should be values between 0 and 1.
`cluster.sample`	set to TRUE if data are clustered in the long format (i.e. multiple rows per individual or other cluster).
`cluster.name`	the name (as a character) of the column containing the cluster identifiers.
`sample.resid.y`	sample.resid if the `outcome` is Gaussian, should the simulation sample from the residuals of the linear regression model of the outcome to approximate the empirical distribution of the outcome in the simulation (Monte Carlo integration) (if so, set to `TRUE`), or should it sample from a Gaussian distribution with the standard deviation of the outcome? If the true distribution of the continuous outcome is not very Gaussian, the former may be preferred.
`sample.resid.m`	sample.resid if the `mediator` is Gaussian, should the simulation sample from the residuals of the linear regression model of the mediator to approximate the empirical distribution of the mediator in the simulation (Monte Carlo integration) (if so, set to `TRUE`), or should it sample from a Gaussian distribution with the standard deviation of the mediator? If the true distribution of the continuous mediator is not very Gaussian, the former may be preferred.
`print.iteration`	print the bootstrap iteration

Value

out_nc_m returns the mean level of the mediator under the natural course, which is a value that should be close to the empirically observed value of the mediator for each group. out_nc_quantile provides the alpha/2 and 1-alpha/2 bootstrap quantiles for this mean (AKA bootstrap percentile confidence intervals). out_nc_y and out_nc_quantile_y provide the corresponding values, but then for the outcome variable Y. Similarly, out_cf_m, out_cf_quantile_m,out_cf_y, and out_cf_quantile_y provide the corresponding values for the counterfactual scenario where the mediators of the groups are equalized. mediation returns the proportion mediated by setting the intervened on mediator to be equal in level to the reference group and mediation_quantile returns the 1-alpha confidence interval.mc_conv_info_m and mc_conv_info_y provide information that can help determine the number of Monte Carlo and Bootstrap iterations needed to achieve stability. See the Examples for more information.

Examples

set.seed(100)
# the decomposition functions in our package are computationally intensive
# to make the example run quick, I perform it on a subsample (n=125) of the data:
cfd.example.sample <- cfd.example.data[sample(125),]
quantile.results.1 <- cfd.quantile(formula.y='out.gauss ~ SES + med.gauss + med.binom + age',
                                  formula.m='med.gauss ~ SES + age',
                                  mediator='med.gauss',
                                  group='SES',
                                  data=cfd.example.sample,
                                  family.y='gaussian',
                                  family.m='gaussian',
                                  bs.size=50,
                                  mc.size=10,
                                  alpha=0.05,
                                  probs=0.50)
# also note that normally we would recommend an bs.size of 250+
# and an mc.size of 50+
# let's interpret the output of this function:
# the differences between SES groups 1 and 2 were first:
mean(quantile.results.1$out_nc_y[,2] - quantile.results.1$out_nc_y[,1])
# and after giving the gaussian mediator of SES group 2 the distribution of the one in group 1
# the difference becomes:
mean(quantile.results.1$out_cf_y[,2] - quantile.results.1$out_nc_y[,1])
# so the % of the outcome Y that is due to differences between the two SES groups
# in the gaussian mediator is
mean(1-(quantile.results.1$out_cf_y[,2] - quantile.results.1$out_nc_y[,1]) /
(quantile.results.1$out_nc_y[,2] - quantile.results.1$out_nc_y[,1]))
# we can also get this number, and the one from the comparison of the other SES group
# with group 1, straight from the object
quantile.results.1$mediation
# and we can get the 1-alpha CI for each:
quantile.results.1$mediation_quantile
# see README.md for a more detailed description of the functions in this package.

[Package cfdecomp version 0.4.0 Index]