R: Mean differences across two independent groups (independent...

means_diff {quest}

R Documentation

Mean differences across two independent groups (independent two-samples t-tests)

Description

means_diff tests for mean differences across two independent groups with independent two-samples t-tests. The function also calculates the descriptive statistics for each group and the standardized mean differences (i.e., Cohen's d) based on the pooled standard deviations. mean_diff is simply a wrapper for t.test plus some extra calculations.

Usage

means_diff(
  data,
  vrb.nm,
  bin.nm,
  lvl = levels(as.factor(data[[bin.nm]])),
  var.equal = TRUE,
  d.ci.type = "unbiased",
  ci.level = 0.95,
  check = TRUE
)

Arguments

`data`	data.frame of data.
`vrb.nm`	character vector of colnames specifying the variables in `data` to conduct the independent two-sample t-tests for.
`bin.nm`	character vector of length 1 specifying the binary variable in `data`. It identifies the two groups with two (and only two) unique values (other than missing values).
`lvl`	character vector with length 2 specifying the unique values for the two groups. If `data[[bin.nm]]` is a factor, then `lvl` should be the factor levels rather than the underlying integer codes. This argument allows you to specify the direction of the mean difference. `means_diff` calculates the mean differences as `data[[vrb.nm]][data[[bin.nm]] == lvl[2], ]` - `data[[vrb.nm]][data[[bin.nm]] == lvl[1], ]` such that it is group 2 - group 1. By changing which group is group 1 vs. group 2, the direction of the mean difference can be changed. See details.
`var.equal`	logical vector of length 1 specifying whether the variances of the groups are assumed to be equal (TRUE) or not (FALSE). If TRUE, a traditional independent two-samples t-test is computed; if FALSE, Welch's t-test is computed. These two tests differ by their degrees of freedom and p-values.
`d.ci.type`	character vector with length 1 specifying the type of confidence intervals to compute for the standardized mean difference (i.e., Cohen's d). There are currently three options: 1) "unbiased" which calculates the unbiased standard error of Cohen's d based on formula 25 in Viechtbauer (2007). A symmetrical confidence interval is then calculated based on the standard error. 2) "tdist" which calculates the confidence intervals based on the t-distribution using the function `cohen.d.ci`, 3) "classic" which calculates the confidence interval of Cohen's d based on the confidence interval of the mean difference itself. The lower and upper confidence bounds are divided by the pooled standard deviation. Technically, this confidence interval is biased due to not taking into account the uncertainty of the standard deviations. No standard error is calculated for this option and NA is returned for "d_se" in the return object.
`ci.level`	numeric vector of length 1 specifying the confidence level. `ci.level` must range from 0 to 1.
`check`	logical vector of length 1 specifying whether the input arguments should be checked for errors. For example, if `data[[bin.nm]]` has more than 2 unique values (other than missing values) or if `bin.nm` is not a colname in `data`. This is a tradeoff between computational efficiency (FALSE) and more useful error messages (TRUE).

Details

means_diff calculates the mean differences as data[[vrb.nm]][data[[bin.nm]] == lvl[2], ] - data[[vrb.nm]][data[[bin.nm]] == lvl[1], ] such that it is group 2 - group 1. Group 1 corresponds to the first factor level of data[[bin.nm]] (after being coerced to a factor). Group 2 correspond to the second factor level of data[[bin.nm]] (after being coerced to a factor). This was set up to handle dummy coded treatment variables in a desirable way. For example, if data[[bin.nm]] is a numeric vector with values 0 and 1, the default factor coersion will have the first factor level be "0" and the second factor level "1". This would result will correspond to 1 - 0. However, if the first factor level of data[[bin.nm]] is "treatment" and the second factor level is "control", the result will correspond to control - treatment. If the opposite is desired (e.g., treatment - control), this can be reversed within the function by specifying the lvl argument as c("control","treatment"). Note, means_diff diverts from t.test by calculating the mean difference as group 2 - group 1 (as opposed to the group 1 - group 2 that t.test does). However, group 2 - group 1 is the convention that psych::cohen.d uses as well.

means_diff calculates the pooled standard deviation in a different way than cohen.d. Therefore, the Cohen's d estimates (and confidence intervals if d.ci.type == "tdist") differ from those in cohen.d. means_diff uses the total degrees of freedom in the denomenator while cohen.d uses the total sample size in the denomenator - based on the notation in McGrath & Meyer (2006). However, almost every introduction to statistics textbook uses the total degrees of freedom in the denomenator and that is what makes more sense to me. See examples.

Value

list of data.frames vectors containing statistical information about the mean differences (the rownames of each data.frame are vrb.nm): 1) nhst = independent two-samples t-test stat info in a data.frame, 2) desc = descriptive statistics stat info in a data.frame, 3) std = standardized mean difference stat info in a data.frame

1) nhst = independent two-samples t-test stat info in a data.frame

est: mean difference estimate (i.e., group 2 - group 1)
se: standard error
t: t-value
df: degrees of freedom
p: two-sided p-value
lwr: lower bound of the confidence interval
upr: upper bound of the confidence interval

2) desc = descriptive statistics stat info in a data.frame

mean_'lvl[2]': mean of group 2
mean_'lvl[1]': mean of group 1
sd_'lvl[2]': standard deviation of group 2
sd_'lvl[1]': standard deviation of group 1
n_'lvl[2]': sample size of group 2
n_'lvl[1]': sample size of group 1

3) std = standardized mean difference stat info in a data.frame

d_est: Cohen's d estimate
d_se: Cohen's d standard error
d_lwr: Cohen's d lower bound of the confidence interval
d_upr: Cohen's d upper bound of the confidence interval

References

McGrath, R. E., & Meyer, G. J. (2006). When effect sizes disagree: the case of r and d. Psychological Methods, 11(4), 386-401.

Viechtbauer, W. (2007). Approximate confidence intervals for standardized effect sizes in the two-independent and two-dependent samples design. Journal of Educational and Behavioral Statistics, 32(1), 39-60.

Examples


# independent two-samples t-tests
means_diff(data = mtcars, vrb.nm = c("mpg","cyl","disp"), bin.nm = "vs")
means_diff(data = mtcars, vrb.nm = c("mpg","cyl","disp"), bin.nm = "vs",
   d.ci.type = "classic")
means_diff(data = mtcars, vrb.nm = c("mpg","cyl","disp"), bin.nm = "vs",
   lvl = c("1","0")) # signs are reversed
means_diff(data = mtcars, vrb.nm = c("mpg","cyl","disp"), bin.nm = "vs",
   lvl = c(1,0)) # can provide numeric levels for dummy variables

# compare to psych::cohen.d()
means_diff(data = mtcars, vrb.nm = c("mpg","cyl","disp"), bin.nm = "vs",
   d.ci.type = "tdist")
tmp_nm <- c("mpg","cyl","disp","vs") # so that Roxygen2 doesn't freak out
cohend_obj <- psych::cohen.d(mtcars[tmp_nm], group = "vs")
as.data.frame(cohend_obj[["cohen.d"]]) # different estimate of cohen's d
   # of course, this also leads to different confidence interval bounds as well

# same as intercept-only regression when var.equal = TRUE
means_diff(data = mtcars, vrb.nm = "mpg", bin.nm = "vs")
lm_obj <- lm(mpg ~ vs, data = mtcars)
coef(summary(lm_obj))

# if levels are not unique values in data[[bin.nm]]
## Not run: 
means_diff(data = mtcars, vrb.nm = c("mpg","cyl","disp"), bin.nm = "vs",
   lvl = c("zero", "1")) # an error message is returned
means_diff(data = mtcars, vrb.nm = c("mpg","cyl","disp"), bin.nm = "vs",
   lvl = c("0", "one")) # an error message is returned

## End(Not run)

[Package quest version 0.2.0 Index]