DurgaDiff {Durga}R Documentation

Estimate group mean differences

Description

Estimates differences between groups in preparation for plotting by DurgaPlot.

Usage

DurgaDiff(x, ...)

## Default S3 method:
DurgaDiff(
  x,
  data.col,
  group.col,
  id.col,
  groups,
  contrasts = "*",
  effect.type = "mean",
  R = 1000,
  boot.params = list(),
  ci.conf = 0.95,
  boot.ci.params = list(),
  na.rm = FALSE,
  ...
)

Arguments

x

A data frame (or similar) containing values to be compared, or a formula (see DurgaDiff.formula).

...

Ignored

data.col

Name (character) or index (numeric) of the column within x containing the measurement data.

group.col

Name or index of the column within x containing the values to group by. May be a vector of column names/indices, in which case values from each column are concatenated to define groups.

id.col

Specify for paired data/repeated measures/with-subject comparisons only. Name or index of ID column for repeated measures/paired data. Observations for the same individual must have the same ID. For non-paired data, do not specify an id.col, (or use id.col = NA).

groups

Vector of group names. Defaults to all groups in x in natural order. If groups is a named vector, the names are used as group labels for plotting or printing. If data.col and group.col are not specified, x is assumed be to in wide format, and groups must be a list of column names identifying the group/treatment data (see example).

contrasts

Specify the pairs of groups to be compared. By default, all pairwise differences are generated. May be a single string, a vector of strings, or a matrix. Specify NULL to avoid calculating any contrasts. See Details for more information.

effect.type

Type of group difference to be estimated. Values cannot be abbreviated. See Details for further information.

R

The number of bootstrap replicates. R should be larger than your sample size, so the default value of 1000 may need to be increased for large sample sizes. If R <= nrow(x), an error such as "Error in bca.ci... estimated adjustment 'a' is NA" will be thrown. Additionally, warnings such as "In norm.inter(t, adj.alpha) : extreme order statistics used as endpoints" may be avoided by increasing R. Specify R = NA if you do not wish to calculate any CIs, either for group means for for effect sizes. This may be useful if Durga is only being used for plotting large data sets.

boot.params

Optional list of additional names parameters to pass to the boot function.

ci.conf

Numeric confidence level of the required confidence interval, e.g. ci.conf = 0.95 specifies that 95\ be calculated. Applies to both CI of effect sizes and CI of group means.

boot.ci.params

Optional list of additional names parameters to pass to the boot.ci function.

na.rm

a logical evaluating to TRUE or FALSE indicating whether NA values should be stripped before the computation proceeds. If TRUE for "paired" data (i.e. id.col is specified), all rows (observations) for IDs with missing data are stripped.

Details

Data format

x may be a formula; see DurgaDiff.formula.

If x is a data.frame (or similar) it may be in either long or wide format. In long format, one column (data.col) contains the measurement or value to be compared, and another column (group.col) contains the group identity. Repeated measures/paired data/within-subject comparisons in long format require a subject identity column (id.col).

Wide format contains different measurements in different columns of the same row, and is well-suited for repeated measures/paired/within-subject comparison data. To pass wide format data, do not specify the arguments data.col or group.col. Instead, you must explicitly specify the groups to be compared in the groups argument. Each group must be the name of a column in x. For paired data, you may specify id.col, although it is not required, as wide format data is assumed to be paired. The id.col can be a column that already exists and uniquely identifies each specimen, or it can be the name of a column to be created, in which case the specimen ID will be a generated integer sequence. Unpaired data can be in wide format, but it is necessary to inform Durga by passing id.col = NULL. Wide format data will be internally converted to long format, then processing continues as for long format input.

Contrasts

The pairs of groups to be compared are defined by the parameter contrasts. An asterisk ("*", the default) creates contrasts for all possible pairs of groups. A single string has a format such as "group1 - group2, group3 - group4". A single string such as ".- control" compares all groups against the "control" group, i.e. the "." expands to all groups except the named group. A vector of strings looks like c("group1 - group2", "group3 - group4"). If a matrix is specified, it must have a column for each contrast, with the first group in row 1 and the second in row 2.

Effect types

The effect.type parameter determines the effect size measure to be calculated. Our terminology generally follows Lakens (2013), with d meaning a biased estimate and g meaning a bias-corrected estimate. Some writers reverse this usage or use alternative terminology. Cumming (2012) recommends always using a bias-corrected estimate (although bias correction is unnecessary for large sample sizes). Durga applies Hedges' exact method for bias correction.

The effect type we call Cohen's\text{ }d for unpaired data is called Cohen's\text{ }d_s^* by Delacre et al. (2021). For paired data, our Cohen's\text{ }d is identical to Cohen's\text{ }d for unpaired data (Delacre et al. 2021); it is called d_{av} by Cumming (2012; equation 11.10). For further details, refer to Khan and McLean (2023).

The set of possible values for the effect.type argument, and their meanings, is described below.

Unpaired effect types
Code Label Effect type Standardiser
mean Mean\text{ }difference Unstandardised difference of group means NA
⁠cohens d⁠ Cohen's\text{ }d Difference in means standardised by non-pooled average SD (Delacre et al. 2021) \sqrt{({s_1}^2 + {s_2}^2)/2}
⁠hedges g⁠ Hedges'\text{ }g Bias-corrected Cohen's\text{ }d (Delacre et al. 2021) \sqrt{({s_1}^2 + {s_2}^2)/2}
cohens d_s Cohen's\text{ }d_s Difference in means standardised by the pooled standard deviation (Lakens 2013, equation 1) \sqrt{\frac{(n_1-1){s_1}^2 + (n_2-1){s_2}^2}{{n_1} + {n_2} - 2}}
⁠hedges g_s⁠ Hedges'\text{ }g_s Bias-corrected Cohen's\text{ }d_s (Lakens 2013, equation 4) \sqrt{\frac{(n_1-1){s_1}^2 + (n_2-1){s_2}^2}{{n_1} + {n_2} - 2}}
glass delta_pre Glass's\text{ }\Delta_{pre} Difference in means standardised by the standard deviation of the pre-measurement group (which is the 2nd group in a contrast). Lakens (2013) recommends using Glass's \Delta whenever standard deviations differ substantially between conditions s_2
glass delta_post Glass's\text{ }\Delta_{post} Difference in means standardised by the standard deviation of the post-measurement group (which is the 1st group in a contrast) s_1
Paired effect types
Code Label Effect type Standardiser
mean Mean\text{ }difference Unstandardised mean of group differences NA
cohens d Cohen's\text{ }d Similar to Cohen's\text{ }d_{av} except that the normaliser is non-pooled average SD rather than mean SD, as recommended by Cummings (2012, eqn 11.9) \sqrt{({s_1}^2 + {s_2}^2)/2}
hedges g Hedges'\text{ }g Bias-corrected Cohen's\text{ }d \sqrt{({s_1}^2 + {s_2}^2)/2}
cohens d_z Cohen's\text{ }d_z Mean of differences, standardised by the standard deviation of the differences, (Lakens 2013, equation 6). Cummings (2012) recommends against using Cohen's\text{ }d_z, preferring Cumming's\text{ }d_{av} \sqrt{\frac{\sum{({X_{diff}} - {M_{diff}})^2}}{n-1}}
⁠hedges g_z⁠ Hedges'\text{ }g_z Bias-corrected Cohen's\text{ }d_z \sqrt{\frac{\sum{({X_{diff}} - {M_{diff}})^2}}{n-1}}
cohens d_av Cohen's\text{ }d_{av} Difference in means standardised by the average standard deviation of the groups (Lakens 2013, equation 10) \dfrac{{s_1} + {s_2}}{2}
hedges g_av Hedges'\text{ }g_{av} Bias-corrected Cohen's\text{ }d_{av} \dfrac{{s_1} + {s_2}}{2}

As a simple rule of thumb, if you want a standardised effect type and you don't know which one to use, use "hedges g" for either paired or unpaired data, as it is recommended by Delacre et al., (2021) for unpaired data and cumming (2012) for paired data.

Additional effect types can be applied by passing a function for effect.type. The function must accept two parameters and return a single numeric value, the effect size. Each parameter is a vector of values from one of the two groups to be compared (group 2 and group 1).

Confidence intervals

Confidence intervals for the estimate are determined using bootstrap resampling, using the adjusted bootstrap percentile (BCa) method (see boot and boot.ci). Additional arguments can be passed to the boot (boot.ci) by passing a named list of values as the argument boot.params (boot.ci.params).

Value

A DurgaDiff object, which is a list containing:

group.statistics

Matrix with a row for each group, columns are: mean, median, sd (standard deviation), se (standard error of the mean), CI.lower and CI.upper (lower and upper bootstrapped confidence intervals of the mean, confidence level as set by the ci.conf parameter) and n (group sample size). If there are fewer than 3 distinct values in the group, or if R is NA, the confidence interval will not be calculated and CI.lower and CI.upper will be NA.

group.differences

List of DurgaGroupDiff objects, which are boot objects with added confidence interval information. See boot and boot.ci. This element will be missing if contrasts is empty or NULL

groups

Vector of group names

group.names

Labels used to identify groups

effect.type

Value of effect.type parameter

effect.name

Name of the effect type; may include formatting such as subscripts

effect.name.print

Text-only version of effect.name for printing; subscripts are indicated by "_"

data.col

Value of data.col parameter; may be an index or a name

data.col.name

Name of the data.col column

group.col

Value of group.col parameter; may be an index or a name

group.col.name

Name of the group.col column

id.col

Value of id.col parameter. May be NULL

paired.data

TRUE if paired differences were estimated

data

The input data frame (x), or the reshaped (long format) data frame if the input data set was in wide format

call

How this function was called

A DurgaGroupDiff object is a boot object (as returned by boot) with added bootci components (as returned by boot.ci) and components identifying the groups used to estimate the difference. Particularly relevant members are:

t0

The observed value of the statistic

bca[4]

The lower endpoint of the confidence interval

bca[5]

The upper endpoint of the confidence interval

groups

The difference is estimated on groups[1] - groups[2]

References

See Also

DurgaDiff.formula, boot, boot.ci, DurgaPlot

Examples


d <- DurgaDiff(insulin, "sugar", "treatment", "id")
print(d)

# Change group order and displayed group labels, reverse the
# direction of one of the contrasts from the default
d <- DurgaDiff(petunia, 1, 2,
               groups = c("Self-fertilised" = "self_fertilised",
                          "Intercrossed" = "inter_cross",
                          "Westerham-crossed" = "westerham_cross"),
               contrasts = c("Westerham-crossed - Self-fertilised",
                             "Westerham-crossed - Intercrossed",
                             "Intercrossed - Self-fertilised"))

# Wide format data
d <- DurgaDiff(insulin.wide, groups = c("sugar.before", "sugar.after"))


[Package Durga version 2.0 Index]