mean_diff {quest} | R Documentation |
Mean difference across two independent groups (independent two-samples t-test)
Description
mean_diff
tests for mean differences across two independent groups
with an independent two-samples t-test. The function also calculates the
descriptive statistics for each group and the standardized mean difference
(i.e., Cohen's d) based on the pooled standard deviation. mean_diff
is
simply a wrapper for t.test
plus some extra
calculations.
Usage
mean_diff(
x,
bin,
lvl = levels(as.factor(bin)),
var.equal = TRUE,
d.ci.type = "unbiased",
ci.level = 0.95,
check = TRUE
)
Arguments
x |
numeric vector. |
bin |
atomic vector (e.g., factor) the same length as |
lvl |
character vector with length 2 specifying the unique values for
the two groups. If |
var.equal |
logical vector of length 1 specifying whether the variances of the groups are assumed to be equal (TRUE) or not (FALSE). If TRUE, a traditional independent two-samples t-test is computed; if FALSE, Welch's t-test is computed. These two tests differ by their degrees of freedom and p-values. |
d.ci.type |
character vector with length 1 of specifying the type of
confidence intervals to compute for the standardized mean difference (i.e.,
Cohen's d). There are currently three options: 1) "unbiased" which
calculates the unbiased standard error of Cohen's d based on formula 25 in
Viechtbauer (2007). A symmetrical confidence interval is then calculated
based on the standard error. 2) "tdist" which calculates the confidence
intervals based on the t-distribution using the function
|
ci.level |
numeric vector of length 1 specifying the confidence level.
|
check |
logical vector of length 1 specifying whether the input
arguments should be checked for errors. For example, if |
Details
mean_diff
calculates the mean difference as x[bin == lvl[2] ]
-
x[bin == lvl[1] ]
such that it is group 2 - group 1. Group 1 corresponds
to the first factor level of bin
(after being coerced to a factor).
Group 2 correspond to the second factor level bin
(after being coerced
to a factor). This was set up to handle dummy coded treatment variables in a
desirable way. For example, if bin
is a numeric vector with values
0
and 1
, the default factor coersion will have the first factor
level be "0" and the second factor level "1". This would result will
correspond to 1 - 0. However, if the first factor level of bin
is
"treatment" and the second factor level is "control", the result will
correspond to control - treatment. If the opposite is desired (e.g.,
treatment - control), this can be reversed within the function by specifying
the lvl
argument as c("control","treatment")
. Note,
mean_diff
diverts from t.test
by calculating the mean
difference as group 2 - group 1 (as opposed to the group 1 - group 2 that
t.test
does). However, group 2 - group 1 is the convention that
psych::cohen.d
uses as well.
mean_diff
calculates the pooled standard deviation in a different way
than cohen.d
. Therefore, the Cohen's d estimates (and
confidence intervals if d.ci.type == "tdist") differ from those in
cohen.d
. mean_diff
uses the total degrees of
freedom in the denomenator while cohen.d
uses the total
sample size in the denomenator - based on the notation in McGrath & Meyer
(2006). However, almost every introduction to statistics textbook uses the
total degrees of freedom in the denomenator and that is what makes more sense
to me. See examples.
Value
list of numeric vectors containing statistical information about the mean difference: 1) nhst = independent two-samples t-test stat info in a numeric vector, 2) desc = descriptive statistics stat info in a numeric vector, 3) std = standardized mean difference stat info in a numeric vector
1) nhst = independent two-samples t-test stat info in a numeric vector
- est
mean difference estimate (i.e., group 2 - group 1)
- se
standard error
- t
t-value
- df
degrees of freedom
- p
two-sided p-value
- lwr
lower bound of the confidence interval
- upr
upper bound of the confidence interval
2) desc = descriptive statistics stat info in a numeric vector
- mean_'lvl[2]'
mean of group 2
- mean_'lvl[1]'
mean of group 1
- sd_'lvl[2]'
standard deviation of group 2
- sd_'lvl[1]'
standard deviation of group 1
- n_'lvl[2]'
sample size of group 2
- n_'lvl[1]'
sample size of group 1
3) std = standardized mean difference stat info in a numeric vector
- d_est
Cohen's d estimate
- d_se
Cohen's d standard error
- d_lwr
Cohen's d lower bound of the confidence interval
- d_upr
Cohen's d upper bound of the confidence interval
References
McGrath, R. E., & Meyer, G. J. (2006). When effect sizes disagree: the case of r and d. Psychological Methods, 11(4), 386-401.
Viechtbauer, W. (2007). Approximate confidence intervals for standardized effect sizes in the two-independent and two-dependent samples design. Journal of Educational and Behavioral Statistics, 32(1), 39-60.
See Also
t.test
the workhorse for mean_diff
,
means_diff
for multiple variables across the same two groups,
cohen.d
for another standardized mean difference function,
mean_change
for dependent two-sample t-test,
mean_test
for one-sample t-test,
Examples
# independent two-samples t-test
mean_diff(x = mtcars$"mpg", bin = mtcars$"vs")
mean_diff(x = mtcars$"mpg", bin = mtcars$"vs", lvl = c("1","0"))
mean_diff(x = mtcars$"mpg", bin = mtcars$"vs", lvl = c(1, 0)) # levels don't have to be character
mean_diff(x = mtcars$"mpg", bin = mtcars$"vs", d.ci.type = "classic")
# compare to psych::cohen.d()
mean_diff(x = mtcars$"mpg", bin = mtcars$"vs", d.ci.type = "tdist")
tmp_nm <- c("mpg","vs") # because otherwise Roxygen2 gets upset
cohend_obj <- psych::cohen.d(mtcars[tmp_nm], group = "vs")
as.data.frame(cohend_obj[["cohen.d"]]) # different estimate of cohen's d
# of course, this also leads to different confidence interval bounds as well
# same as intercept-only regression when var.equal = TRUE
mean_diff(x = mtcars$"mpg", bin = mtcars$"vs", d.ci.type = "tdist")
lm_obj <- lm(mpg ~ vs, data = mtcars)
coef(summary(lm_obj))
# errors
## Not run:
mean_diff(x = mtcars$"mpg",
bin = attitude$"ratings") # `bin` has length different than `x`
mean_diff(x = mtcars$"mpg",
bin = mtcars$"gear") # `bin` has more than two unique values (other than missing values)
## End(Not run)