means_diff {quest} | R Documentation |
Mean differences across two independent groups (independent two-samples t-tests)
Description
means_diff
tests for mean differences across two independent groups
with independent two-samples t-tests. The function also calculates the
descriptive statistics for each group and the standardized mean differences
(i.e., Cohen's d) based on the pooled standard deviations. mean_diff
is simply a wrapper for t.test
plus some extra
calculations.
Usage
means_diff(
data,
vrb.nm,
bin.nm,
lvl = levels(as.factor(data[[bin.nm]])),
var.equal = TRUE,
d.ci.type = "unbiased",
ci.level = 0.95,
check = TRUE
)
Arguments
data |
data.frame of data. |
vrb.nm |
character vector of colnames specifying the variables in
|
bin.nm |
character vector of length 1 specifying the binary variable in
|
lvl |
character vector with length 2 specifying the unique values for
the two groups. If |
var.equal |
logical vector of length 1 specifying whether the variances of the groups are assumed to be equal (TRUE) or not (FALSE). If TRUE, a traditional independent two-samples t-test is computed; if FALSE, Welch's t-test is computed. These two tests differ by their degrees of freedom and p-values. |
d.ci.type |
character vector with length 1 specifying the type of
confidence intervals to compute for the standardized mean difference (i.e.,
Cohen's d). There are currently three options: 1) "unbiased" which
calculates the unbiased standard error of Cohen's d based on formula 25 in
Viechtbauer (2007). A symmetrical confidence interval is then calculated
based on the standard error. 2) "tdist" which calculates the confidence
intervals based on the t-distribution using the function
|
ci.level |
numeric vector of length 1 specifying the confidence level.
|
check |
logical vector of length 1 specifying whether the input
arguments should be checked for errors. For example, if
|
Details
means_diff
calculates the mean differences as
data[[vrb.nm]][data[[bin.nm]] == lvl[2], ]
-
data[[vrb.nm]][data[[bin.nm]] == lvl[1], ]
such that it is group 2 -
group 1. Group 1 corresponds to the first factor level of
data[[bin.nm]]
(after being coerced to a factor). Group 2 correspond
to the second factor level of data[[bin.nm]]
(after being coerced to a
factor). This was set up to handle dummy coded treatment variables in a
desirable way. For example, if data[[bin.nm]]
is a numeric vector with
values 0
and 1
, the default factor coersion will have the first
factor level be "0" and the second factor level "1". This would result will
correspond to 1 - 0. However, if the first factor level of
data[[bin.nm]]
is "treatment" and the second factor level is
"control", the result will correspond to control - treatment. If the opposite
is desired (e.g., treatment - control), this can be reversed within the
function by specifying the lvl
argument as
c("control","treatment")
. Note, means_diff
diverts from
t.test
by calculating the mean difference as group 2 - group 1 (as
opposed to the group 1 - group 2 that t.test
does). However, group 2 -
group 1 is the convention that psych::cohen.d
uses as well.
means_diff
calculates the pooled standard deviation in a different way
than cohen.d
. Therefore, the Cohen's d estimates (and
confidence intervals if d.ci.type == "tdist") differ from those in
cohen.d
. means_diff
uses the total degrees of
freedom in the denomenator while cohen.d
uses the total
sample size in the denomenator - based on the notation in McGrath & Meyer
(2006). However, almost every introduction to statistics textbook uses the
total degrees of freedom in the denomenator and that is what makes more sense
to me. See examples.
Value
list of data.frames vectors containing statistical information about
the mean differences (the rownames of each data.frame are vrb.nm
):
1) nhst = independent two-samples t-test stat info in a data.frame,
2) desc = descriptive statistics stat info in a data.frame,
3) std = standardized mean difference stat info in a data.frame
1) nhst = independent two-samples t-test stat info in a data.frame
- est
mean difference estimate (i.e., group 2 - group 1)
- se
standard error
- t
t-value
- df
degrees of freedom
- p
two-sided p-value
- lwr
lower bound of the confidence interval
- upr
upper bound of the confidence interval
2) desc = descriptive statistics stat info in a data.frame
- mean_'lvl[2]'
mean of group 2
- mean_'lvl[1]'
mean of group 1
- sd_'lvl[2]'
standard deviation of group 2
- sd_'lvl[1]'
standard deviation of group 1
- n_'lvl[2]'
sample size of group 2
- n_'lvl[1]'
sample size of group 1
3) std = standardized mean difference stat info in a data.frame
- d_est
Cohen's d estimate
- d_se
Cohen's d standard error
- d_lwr
Cohen's d lower bound of the confidence interval
- d_upr
Cohen's d upper bound of the confidence interval
References
McGrath, R. E., & Meyer, G. J. (2006). When effect sizes disagree: the case of r and d. Psychological Methods, 11(4), 386-401.
Viechtbauer, W. (2007). Approximate confidence intervals for standardized effect sizes in the two-independent and two-dependent samples design. Journal of Educational and Behavioral Statistics, 32(1), 39-60.
See Also
means_diff
for independent two-sample t-test of a single variable,
t.test
the workhorse for mean_diff
,
cohen.d
for another standardized mean difference function,
means_change
for dependent two-sample t-tests,
means_test
for one-sample t-tests,
Examples
# independent two-samples t-tests
means_diff(data = mtcars, vrb.nm = c("mpg","cyl","disp"), bin.nm = "vs")
means_diff(data = mtcars, vrb.nm = c("mpg","cyl","disp"), bin.nm = "vs",
d.ci.type = "classic")
means_diff(data = mtcars, vrb.nm = c("mpg","cyl","disp"), bin.nm = "vs",
lvl = c("1","0")) # signs are reversed
means_diff(data = mtcars, vrb.nm = c("mpg","cyl","disp"), bin.nm = "vs",
lvl = c(1,0)) # can provide numeric levels for dummy variables
# compare to psych::cohen.d()
means_diff(data = mtcars, vrb.nm = c("mpg","cyl","disp"), bin.nm = "vs",
d.ci.type = "tdist")
tmp_nm <- c("mpg","cyl","disp","vs") # so that Roxygen2 doesn't freak out
cohend_obj <- psych::cohen.d(mtcars[tmp_nm], group = "vs")
as.data.frame(cohend_obj[["cohen.d"]]) # different estimate of cohen's d
# of course, this also leads to different confidence interval bounds as well
# same as intercept-only regression when var.equal = TRUE
means_diff(data = mtcars, vrb.nm = "mpg", bin.nm = "vs")
lm_obj <- lm(mpg ~ vs, data = mtcars)
coef(summary(lm_obj))
# if levels are not unique values in data[[bin.nm]]
## Not run:
means_diff(data = mtcars, vrb.nm = c("mpg","cyl","disp"), bin.nm = "vs",
lvl = c("zero", "1")) # an error message is returned
means_diff(data = mtcars, vrb.nm = c("mpg","cyl","disp"), bin.nm = "vs",
lvl = c("0", "one")) # an error message is returned
## End(Not run)