statcheck {statcheck} | R Documentation |
Extract statistics and recompute p-values
Description
statcheck
extracts Null Hypothesis Significance (NHST) results from
strings and returns the extracted values, reported p-values and recomputed
p-values.
Usage
statcheck(
texts,
stat = c("t", "F", "cor", "chisq", "Z", "Q"),
OneTailedTests = FALSE,
alpha = 0.05,
pEqualAlphaSig = TRUE,
pZeroError = TRUE,
OneTailedTxt = FALSE,
AllPValues = FALSE,
messages = TRUE
)
Arguments
texts |
A vector of strings. |
stat |
Specify which test types you want to extract. "t" to extract
t-values, "F" to extract F-values, "cor" to extract correlations, "chisq"to
extract |
OneTailedTests |
Logical. Do you want to assume that all reported tests are one-tailed (TRUE) or two-tailed (FALSE, default)? |
alpha |
Assumed level of significance in the scanned texts. Defaults to .05. |
pEqualAlphaSig |
Logical. If TRUE, statcheck counts p <= alpha as significant (default), if FALSE, statcheck counts p < alpha as significant. |
pZeroError |
Logical. If TRUE, statcheck counts p = .000 as an error (because a p-value is never exactly zero, and should be reported as < .001), if FALSE, statcheck does not count p = .000 automatically as an error. |
OneTailedTxt |
Logical. If TRUE, statcheck searches the text for "one-sided", "one-tailed", and "directional" to identify the possible use of one-sided tests. If one or more of these strings is found in the text AND the result would have been correct if it was a one-sided test, the result is assumed to be indeed one-sided and is counted as correct. |
AllPValues |
Logical. If TRUE, the output will consist of a dataframe with all detected p values, also the ones that were not part of the full results in APA format. |
messages |
Logical. If TRUE, statcheck will print a progress bar while it's extracting statistics from text. |
Details
statcheck
roughly works in three steps.
1. Scan text for statistical results
statcheck
uses regular expressions to recognizes statistical results
from t-tests, F-tests, \chi2
-tests, Z-tests, Q-tests, and correlations.
statcheck can only recognize these results if the results are reported
exactly according to the APA guidelines:
-
t(df) = value, p = value
-
F(df1, df2) = value, p = value
-
r(df) = value, p = value
-
\chi2
(df, N = value) = value, p = value (N is optional) -
Z = value, p = value
-
Q(df) = value, p = value (statcheck can distinguish between Q, Qw / Q-within, and Qb / Q-between)
statcheck
takes into account that test statistics and p values may be
exactly (=) or inexactly (< or >) reported. Different spacing has also been
taken into account.
2. Recompute p-value
statcheck
uses the reported test statistic and degrees of freedom to
recompute the p-value. By default, the recomputed p-value is two-sided
3. Compare reported and recomputed p-value
This comparison takes into account how the results were reported, e.g.,
p < .05 is treated differently than p = .05. Incongruent p values are marked
as an error
. If the reported result is significant and the recomputed
result is not, or vice versa, the result is marked as a
decision_error
.
Correct rounding is taken into account. For instance, a reported t-value of
2.35 could correspond to an actual value of 2.345 to 2.354 with a range of
p-values that can slightly deviate from the recomputed p-value.
statcheck
will not count cases like this as errors.
Note that when statcheck
flags an error
or
decision_error
, it implicitly assumes that the p-value is the
inconsistent value, but it could just as well be the case that the test
statistic or degrees of freedom contain a reporting error. statcheck
merely detects wether a set of numbers is consistent with each other.
Value
A data frame containing for each extracted statistic:
- source
Name of the file of which the statistic is extracted
- test_type
Character indicating the statistic that is extracted
- df1
First degree of freedom (if applicable)
- df2
Second degree of freedom
- test_comp
Reported comparison of the test statistic, when importing from pdf this will often not be converted properly
- test_value
Reported value of the statistic
- p_comp
Reported comparison, when importing from pdf this might not be converted properly
- reported_p
The reported p-value, or NA if the reported value was n.s.
- computed_p
The recomputed p-value
- raw
Raw string of the statistical reference that is extracted
- error
The computed p value is not congruent with the reported p-value
- decision_error
The reported result is significant whereas the recomputed result is not, or vice versa.
- one_tailed_in_txt
Logical. Does the text contain the string "sided", "tailed", and/or "directional"?
- apa_factor
What proportion of all detected p-values was part of a fully APA reported result?
See Also
For more details, see the online manual.
Examples
txt <- "blablabla the effect was very significant (t(100)=1, p < 0.001)"
statcheck(txt)