z.score {corpora} | R Documentation |
The z-score statistic for frequency counts (corpora)
Description
This function computes a z-score statistic for frequency counts, based on a normal approximation to the correct binomial distribution under the random sampling model.
Usage
z.score(k, n, p = 0.5, correct = TRUE)
Arguments
k |
frequency of a type in the corpus (or an integer vector of frequencies) |
n |
number of tokens in the corpus, i.e. sample size (or an integer vector specifying the sizes of different samples) |
p |
null hypothesis, giving the assumed proportion of this type in the population (or a vector of proportions for different types and/or different populations) |
correct |
if |
Details
The z
statistic is given by
%
z := \dfrac{k - np}{\sqrt{n p (1-p)}}
When Yates' continuity correction is enabled, the absolute
value of the numerator d := k - np
is reduced by 1/2
,
but clamped to a non-negative value.
Value
The z
-score corresponding to the specified data (or a vector of
z
-scores).
Author(s)
Stephanie Evert (https://purl.org/stephanie.evert)
See Also
Examples
# z-test for H0: pi = 0.15 with observed counts 10..30 in a sample of n=100 tokens
k <- c(10:30)
z <- z.score(k, 100, p=.15)
names(z) <- k
round(z, 3)
abs(z) >= 1.96 # significant results at p < .05