mean_diff {quest}R Documentation

Mean difference across two independent groups (independent two-samples t-test)

Description

mean_diff tests for mean differences across two independent groups with an independent two-samples t-test. The function also calculates the descriptive statistics for each group and the standardized mean difference (i.e., Cohen's d) based on the pooled standard deviation. mean_diff is simply a wrapper for t.test plus some extra calculations.

Usage

mean_diff(
  x,
  bin,
  lvl = levels(as.factor(bin)),
  var.equal = TRUE,
  d.ci.type = "unbiased",
  ci.level = 0.95,
  check = TRUE
)

Arguments

x

numeric vector.

bin

atomic vector (e.g., factor) the same length as x that is a binary variable. It identifies the two groups with two (and only two) unique values (other than missing values).

lvl

character vector with length 2 specifying the unique values for the two groups. If bin is a factor, then lvl should be the factor levels rather than the underlying integer codes. This argument allows you to specify the direction of the mean difference. mean_diff calculates the mean difference as x[bin == lvl[2] ] - x[bin == lvl[1] ] such that it is group 2 - group 1. By changing which group is group 1 vs. group 2, the direction of the mean difference can be changed. See details.

var.equal

logical vector of length 1 specifying whether the variances of the groups are assumed to be equal (TRUE) or not (FALSE). If TRUE, a traditional independent two-samples t-test is computed; if FALSE, Welch's t-test is computed. These two tests differ by their degrees of freedom and p-values.

d.ci.type

character vector with length 1 of specifying the type of confidence intervals to compute for the standardized mean difference (i.e., Cohen's d). There are currently three options: 1) "unbiased" which calculates the unbiased standard error of Cohen's d based on formula 25 in Viechtbauer (2007). A symmetrical confidence interval is then calculated based on the standard error. 2) "tdist" which calculates the confidence intervals based on the t-distribution using the function cohen.d.ci, 3) "classic" which calculates the confidence interval of Cohen's d based on the confidence interval of the mean difference itself. The lower and upper confidence bounds are divided by the pooled standard deviation. Technically, this confidence interval is biased due to not taking into account the uncertainty of the standard deviations. No standard error is calculated for this option and NA is returned for "d_se" in the return object.

ci.level

numeric vector of length 1 specifying the confidence level. ci.level must range from 0 to 1.

check

logical vector of length 1 specifying whether the input arguments should be checked for errors. For example, if bin has more than 2 unique values (other than missing values) or if bin has length different than the length of x. This is a tradeoff between computational efficiency (FALSE) and more useful error messages (TRUE).

Details

mean_diff calculates the mean difference as x[bin == lvl[2] ] - x[bin == lvl[1] ] such that it is group 2 - group 1. Group 1 corresponds to the first factor level of bin (after being coerced to a factor). Group 2 correspond to the second factor level bin (after being coerced to a factor). This was set up to handle dummy coded treatment variables in a desirable way. For example, if bin is a numeric vector with values 0 and 1, the default factor coersion will have the first factor level be "0" and the second factor level "1". This would result will correspond to 1 - 0. However, if the first factor level of bin is "treatment" and the second factor level is "control", the result will correspond to control - treatment. If the opposite is desired (e.g., treatment - control), this can be reversed within the function by specifying the lvl argument as c("control","treatment"). Note, mean_diff diverts from t.test by calculating the mean difference as group 2 - group 1 (as opposed to the group 1 - group 2 that t.test does). However, group 2 - group 1 is the convention that psych::cohen.d uses as well.

mean_diff calculates the pooled standard deviation in a different way than cohen.d. Therefore, the Cohen's d estimates (and confidence intervals if d.ci.type == "tdist") differ from those in cohen.d. mean_diff uses the total degrees of freedom in the denomenator while cohen.d uses the total sample size in the denomenator - based on the notation in McGrath & Meyer (2006). However, almost every introduction to statistics textbook uses the total degrees of freedom in the denomenator and that is what makes more sense to me. See examples.

Value

list of numeric vectors containing statistical information about the mean difference: 1) nhst = independent two-samples t-test stat info in a numeric vector, 2) desc = descriptive statistics stat info in a numeric vector, 3) std = standardized mean difference stat info in a numeric vector

1) nhst = independent two-samples t-test stat info in a numeric vector

est

mean difference estimate (i.e., group 2 - group 1)

se

standard error

t

t-value

df

degrees of freedom

p

two-sided p-value

lwr

lower bound of the confidence interval

upr

upper bound of the confidence interval

2) desc = descriptive statistics stat info in a numeric vector

mean_'lvl[2]'

mean of group 2

mean_'lvl[1]'

mean of group 1

sd_'lvl[2]'

standard deviation of group 2

sd_'lvl[1]'

standard deviation of group 1

n_'lvl[2]'

sample size of group 2

n_'lvl[1]'

sample size of group 1

3) std = standardized mean difference stat info in a numeric vector

d_est

Cohen's d estimate

d_se

Cohen's d standard error

d_lwr

Cohen's d lower bound of the confidence interval

d_upr

Cohen's d upper bound of the confidence interval

References

McGrath, R. E., & Meyer, G. J. (2006). When effect sizes disagree: the case of r and d. Psychological Methods, 11(4), 386-401.

Viechtbauer, W. (2007). Approximate confidence intervals for standardized effect sizes in the two-independent and two-dependent samples design. Journal of Educational and Behavioral Statistics, 32(1), 39-60.

See Also

t.test the workhorse for mean_diff, means_diff for multiple variables across the same two groups, cohen.d for another standardized mean difference function, mean_change for dependent two-sample t-test, mean_test for one-sample t-test,

Examples


# independent two-samples t-test
mean_diff(x = mtcars$"mpg", bin = mtcars$"vs")
mean_diff(x = mtcars$"mpg", bin = mtcars$"vs", lvl = c("1","0"))
mean_diff(x = mtcars$"mpg", bin = mtcars$"vs", lvl = c(1, 0)) # levels don't have to be character
mean_diff(x = mtcars$"mpg", bin = mtcars$"vs", d.ci.type = "classic")

# compare to psych::cohen.d()
mean_diff(x = mtcars$"mpg", bin = mtcars$"vs", d.ci.type = "tdist")
tmp_nm <- c("mpg","vs") # because otherwise Roxygen2 gets upset
cohend_obj <- psych::cohen.d(mtcars[tmp_nm], group = "vs")
as.data.frame(cohend_obj[["cohen.d"]]) # different estimate of cohen's d
   # of course, this also leads to different confidence interval bounds as well

# same as intercept-only regression when var.equal = TRUE
mean_diff(x = mtcars$"mpg", bin = mtcars$"vs", d.ci.type = "tdist")
lm_obj <- lm(mpg ~ vs, data = mtcars)
coef(summary(lm_obj))

# errors
## Not run: 
mean_diff(x = mtcars$"mpg",
   bin = attitude$"ratings") # `bin` has length different than `x`
mean_diff(x = mtcars$"mpg",
   bin = mtcars$"gear") # `bin` has more than two unique values (other than missing values)

## End(Not run)


[Package quest version 0.2.0 Index]