t_neat {neatStats}R Documentation

Difference of Two Means and Area Under the Curve


Welch's t-test results including Cohen's d with confidence interval (CI), Bayes factor (BF), and area under the receiver operating characteristic curve (AUC). For non-parametric version, Wilcoxon test results (Mann–Whitney U test, aka "Wilcoxon rank-sum test", for independent samples; Wilcoxon signed-rank test for paired samples; including nonparametric "location difference estimate" (see stats::wilcox.test); along with corresponding rank-based BFs as per van Doorn et al., 2020).


  pair = FALSE,
  nonparametric = FALSE,
  greater = NULL,
  norm_tests = "latent",
  norm_plots = FALSE,
  ci = NULL,
  bf_added = FALSE,
  bf_rscale = sqrt(0.5),
  bf_sample = 1000,
  auc_added = FALSE,
  cutoff = NULL,
  r_added = TRUE,
  for_table = FALSE,
  test_title = NULL,
  round_descr = 2,
  round_auc = 3,
  auc_greater = "1",
  cv_rep = FALSE,
  cv_fold = 10,
  hush = FALSE,
  plots = FALSE,
  rug_size = 4,
  aspect_ratio = 1,
  y_label = "density estimate",
  x_label = "\nvalues",
  factor_name = NULL,
  var_names = c("1", "2"),
  reverse = FALSE



Numeric vector; numbers of the first variable.


Numeric vector; numbers of the second variable.


Logical. If TRUE, all tests (t, BF, AUC) are conducted for paired samples. If FALSE (default) for independent samples.


Logical (FALSE by default). If TRUE, uses nonparametric (rank-based, "Wilcoxon") t-tests (including BFs; see Notes).


NULL or string (or number); optionally specifies one-sided tests (t and BF): either "1" (var1 mean expected to be greater than var2 mean) or "2" (var2 mean expected to be greater than var1 mean). If NULL (default), the test is two-sided.


Normality tests. Any or all of the following character input is accepted (as a single string or a character vector; case-insensitive): "W" (Shapiro-Wilk), "K2" (D'Agostino), "A2" (Anderson-Darling), "JB" (Jarque-Bera); see Notes. Two other options are "all" (same as TRUE; to choose all four previous tests at the same time) or "latent" (default value; prints all tests only if nonparametric is set to FALSE and any of the four tests gives a p value below .05). Each normality test is performed for the difference values between the two variables in case of paired samples, or for each of the two variables for unpaired samples. Set to "none" to disable (i.e., not to perform any normality tests).


If TRUE, displays density, histogram, and Q-Q plots (and scatter plots for paired tests) for each of the two variable (and differences for pairwise observations, in case of paired samples).


Numeric; confidence level for returned CIs for Cohen's d and AUC.


Logical. If TRUE (default), Bayes factor is calculated and displayed.


The scale of the prior distribution (0.707 by default).


Number of samples used to estimate Bayes factor (1000 by default). More samples (e.g. 10000) take longer time but give more stable BF.


Logical (FALSE by default). If TRUE, AUC is calculated and displayed. Includes TPR and TNR, i.e., true positive and true negative rates, i.e. sensitivity and specificity, using an optimal value, i.e. threshold, that provides maximal TPR and TNR. These values may be cross-validated: see cv_rep. (Note that what is designated as "positive" or "negative" depends on the scenario: this function always assumes var1 as positive and var2 as negative. If your scenario or preference differs, you can simply switch the names or values when reporting the results.)


Numeric. Custom cutoff value for AUC TPR and TNR, also to be depicted in the plot. In case of multiple given, the first is used for calculations, but all will be depicted in the plot.


Logical. If TRUE (default), Pearson correlation is calculated and displayed in case of paired comparison.


Logical. If TRUE, omits the confidence level display from the printed text.


NULL or string. If not NULL, simply displayed in printing preceding the statistics. (Useful e.g. to distinguish several different comparisons inside a function or a for loop.)


Number to round to the descriptive statistics (means and SDs).


Number to round to the AUC and its CI.


String (or number); specifies which variable is expected to have greater values for 'cases' as opposed to 'controls': "1" (default; var1 expected to be greater for 'cases' than var2 mean) or "2" (var2 expected to be greater for 'cases' than var1). Not to be confused with one-sided tests; see Details.


FALSE (default), TRUE, or numeric. If TRUE or numeric, a cross-validation is performed for the calculation of TPRs and TNRs. Numeric value specifies the number of repetitions, while, if TRUE, it defaults to 100 repetitions. In each repetition, the data is divided into k random parts ("folds"; see cv_fold), and the optimal accuracy is obtained k times from a k-1 training set (var1 and var2 truncated to equal length, if needed, in each case within each repetition), and the TPR and TNR are calculated from the remaining test set (different each time).


Numeric. The number of folds into which the data is divided for cross-validation (default: 10).


Logical. If TRUE, prevents printing any details to console.


Logical (or NULL). If TRUE, creates a combined density plot (i.e., Gaussian kernel density estimates) from the two variables. Includes dashed vertical lines to indicate means of each of the two variables. If nonparametric is set to TRUE, medians are calculated for these dashed lines instead of means. When auc_added is TRUE (and the AUC is at least .5), the best threshold value for classification (maximal differentiation accuracy using Youden's index) is added to the plot as solid vertical line. (In case of multiple best thresholds with identical overall accuracy, all are added.) If NULL, same as if TRUE except that histogram is added to the background.


Numeric (4 by default): size of the rug ticks below the density plot. Set to 0 (zero) to omit rug plotting.


Aspect ratio of the plots: 1 (1/1) by default. (Set to NULL for dynamic aspect ratio.)


String or NULL; the label for the y axis. (Default: "density estimate".)


String or NULL; the label for the x axis. (Default: "values".)


String or NULL; factor legend title. (Default: NULL.)


A vector of two strings; the variable names to be displayed in the legend. (Default: c("1", "2").)


Logical. If TRUE, reverses the order of variable names displayed in the legend.


The Bayes factor (BF) supporting null hypothesis is denoted as BF01, while that supporting alternative hypothesis is denoted as BF10. When the BF is smaller than 1 (i.e., supports null hypothesis), the reciprocal is calculated (hence, BF10 = BF, but BF01 = 1/BF). When the BF is greater than or equal to 10000, scientific (exponential) form is reported for readability. (The original full BF number is available in the returned named vector as bf.)

For simplicity, Cohen's d is reported for nonparametric tests too: you may however want to consider reporting alternative effect sizes in this case.

The original pROC::auc function, by default, always returns an AUC greater than (or equal to) .5, assuming that the prediction based on values in the expected direction work correctly at least at chance level. This however may be confusing. Consider an example where we measure the heights of persons in a specific small sample and expect that greater height predicts masculine gender. The results are, say, 169, 175, 167, 164 (cm) for one gender, and 176, 182, 179, 165 for the other. If the expectation is correct (the second, greater values are for males), the AUC is .812. However, if in this particular population females are actually taller than males, the AUC is in fact .188. To keep things clear, the t_neat function always makes an assumption about which variable is expected to be greater for correct classification ("1" by default; i.e., var1; to be specified as auc_greater = "2" for var2 to be expected as greater). For this example, if the first (smaller) variables are given as var1 for females, and second (larger), variables are given as var2 for males, we have to specify auc_greater = "2" to indicate the expectation of larger values for males. (Or, easier, just add the expected larger values as var1.)


Prints t-test statistics (including Cohen's d with CI, BF, and AUC, as specified via the corresponding parameters) in APA style. Furthermore, when assigned, returns a list, that contains a named vector 'stats' with the following elements: t (t value), p (p value), d (Cohen's d), bf (Bayes factor), auc (AUC), accuracy (overall accuracy using the optimal classification threshold), and youden (Youden's index: specificity + sensitivity - 1). The latter three are NULL when auc_added is FALSE. When auc_added is TRUE, there are also two or three additional elements of the list. One is 'roc_obj', which is a roc object, to be used e.g. with the roc_neat function. Another is 'best_thresholds', which contains the best threshold value(s) for classification, along with corresponding specificity and sensitivity. The third 'cv_results' contains the results, if any, of the cross-validation of TPRs and TNRs (means per repetition). Finally, if plots is TRUE (or NULL), the plot is displayed as well as returned as a ggplot object, named t_plot.


The Welch's t-test is calculated via stats::t.test.

#'Normality tests are all calculated via fBasics::NormalityTests, selected based on the recommendation of Lakens (2015), quoting Yap and Sim (2011, p. 2153): "If the distribution is symmetric with low kurtosis values (i.e. symmetric short-tailed distribution), then the D'Agostino and Shapiro-Wilkes tests have good power. For symmetric distribution with high sample kurtosis (symmetric long-tailed), the researcher can use the JB, Shapiro-Wilkes, or Anderson-Darling test." See urlhttps://github.com/Lakens/perfect-t-test for more details.

Cohen's d and its confidence interval are calculated, using the t value, via MBESS::ci.smd for independent samples (as standardized mean difference) and via MBESS::ci.sm for paired samples (as standardized mean).

The parametric Bayes factor is calculated via BayesFactor::ttestBF. The nonparametric (rank-based) Bayes factor is a contribution by Johnny van Doorn; the original source code is available via https://osf.io/gny35/.

The correlation and its CI are calculated via stats::cor.test, and is always two-sided, always with 95 percent CI. For more, use corr_neat.

The AUC and its CI are calculated via pROC::auc, and the accuracy at optimal threshold via pROC::coords (x = "best"); both using the object pROC::roc.


Delacre, M., Lakens, D., & Leys, C. (2017). Why psychologists should by default use Welch's t-test instead of Student's t-test. International Review of Social Psychology, 30(1). doi:10.5334/irsp.82

Kelley, K. (2007). Methods for the behavioral, educational, and social sciences: An R package. Behavior Research Methods, 39(4), 979-984. doi:10.3758/BF03192993

Lakens, D. (2015). The perfect t-test (version 1.0.0). Retrieved from https://github.com/Lakens/perfect-t-test. doi:10.5281/zenodo.17603

Robin, X., Turck, N., Hainard, A., Tiberti, N., Lisacek, F., Sanchez, J. C., & Muller, M. (2011). pROC: an open-source package for R and S+ to analyze and compare ROC curves. BMC bioinformatics, 12(1), 77. doi:10.1186/1471-2105-12-77

van Doorn, J., Ly, A., Marsman, M., & Wagenmakers, E.-J. (2020). Bayesian rank-based hypothesis testing for the rank sum test, the signed rank test, and Spearman’s rho. Journal of Applied Statistics, 1–23. doi:10.1080/02664763.2019.1709053

Yap, B. W., & Sim, C. H. (2011). Comparisons of various types of normality tests. Journal of Statistical Computation and Simulation, 81(12), 2141–2155. doi:10.1080/00949655.2010.520163

See Also

corr_neat, roc_neat


# assign two variables (numeric vectors)
v1 = c(191, 115, 129, 43, 523,-4, 34, 28, 33,-1, 54)
v2 = c(4,-2, 23, 13, 32, 16, 3, 29, 37,-4, 65)

t_neat(v1, v2) # prints results as independent samples
t_neat(v1, v2, pair = TRUE) # as paired samples (r added by default)
t_neat(v1, v2, pair = TRUE, greater = "1") # one-sided
t_neat(v1, v2, pair = TRUE, auc_added = TRUE ) # AUC included

# print results and assign returned list
results = t_neat(v1, v2, pair = TRUE)

results$stats['bf'] # get precise BF value

[Package neatStats version 1.13.3 Index]