ks.test {dgof}  R Documentation 
Performs one or two sample KolmogorovSmirnov tests.
ks.test(x, y, ...,
alternative = c("two.sided", "less", "greater"),
exact = NULL, tol=1e8, simulate.p.value=FALSE, B=2000)
x 
a numeric vector of data values. 
y 
a numeric vector of data values, or a character string
naming a cumulative distribution function or an actual cumulative
distribution function such as 
... 
parameters of the distribution specified (as a character
string) by 
alternative 
indicates the alternative hypothesis and must be
one of 
exact 

tol 
used as an upper bound for possible rounding error in values
(say, 
simulate.p.value 
a logical indicating whether to compute pvalues by Monte Carlo simulation, for discrete goodnessoffit tests only. 
B 
an integer specifying the number of replicates used in the Monte Carlo test (for discrete goodnessoffit tests only). 
If y
is numeric, a twosample test of the null hypothesis
that x
and y
were drawn from the same continuous
distribution is performed.
Alternatively, y
can be a character string naming a continuous
(cumulative) distribution function (or such a function),
or an ecdf
function
(or object of class stepfun
) giving a discrete distribution. In
these cases, a onesample test is carried out of the null that the
distribution function which generated x
is distribution y
with parameters specified by ...
.
The presence of ties generates a warning unless y
describes a discrete
distribution (see above), since continuous distributions do not generate them.
The possible values "two.sided"
, "less"
and
"greater"
of alternative
specify the null hypothesis
that the true distribution function of x
is equal to, not less
than or not greater than the hypothesized distribution function
(onesample case) or the distribution function of y
(twosample
case), respectively. This is a comparison of cumulative distribution
functions, and the test statistic is the maximum difference in value,
with the statistic in the "greater"
alternative being
D^+ = \max_u [ F_x(u)  F_y(u) ]
.
Thus in the twosample case
alternative="greater"
includes distributions for which x
is stochastically smaller than y
(the CDF of x
lies
above and hence to the left of that for y
), in contrast to
t.test
or wilcox.test
.
Exact pvalues are not available for the onesided twosample case,
or in the case of ties if y
is continuous. If exact = NULL
(the default), an exact pvalue is computed if the sample size is less
than 100 in the onesample case with y
continuous or if the sample
size is less than or equal to 30 with y
discrete; or if the product of the
sample sizes is less than 10000 in the twosample case for continuous
y
. Otherwise,
asymptotic distributions are used whose approximations may be inaccurate
in small samples. With y
continuous,
the onesample twosided case, exact pvalues are
obtained as described in Marsaglia, Tsang & Wang (2003); the formula of
Birnbaum & Tingey (1951) is used for the onesample onesided case.
In the onesample case with y
discrete, the methods presented in
Conover (1972) and Gleser (1985) are used when exact=TRUE
(or when
exact=NULL
) and length(x)<=30
as described above.
When exact=FALSE
or exact=NULL
with
length(x)>30
, the test is not exact and the resulting pvalues
are known to be conservative. Usage of exact=TRUE
with
sample sizes greater than 30 is not advised due to numerical instabilities;
in such cases, simulated pvalues may be desirable.
If a singlesample test is used with y
continuous,
the parameters specified in
...
must be prespecified and not estimated from the data.
There is some more refined distribution theory for the KS test with
estimated parameters (see Durbin, 1973), but that is not implemented
in ks.test
.
A list with class "htest"
containing the following components:
statistic 
the value of the test statistic. 
p.value 
the pvalue of the test. 
alternative 
a character string describing the alternative hypothesis. 
method 
a character string indicating what type of test was performed. 
data.name 
a character string giving the name(s) of the data. 
Modified by Taylor B. Arnold and John W. Emerson to include onesample testing with a discrete distribution (as presented in Conover's 1972 paper – see references).
Z. W. Birnbaum and Fred H. Tingey (1951), Onesided confidence contours for probability distribution functions. The Annals of Mathematical Statistics, 22/4, 592–596.
William J. Conover (1971), Practical Nonparametric Statistics. New York: John Wiley & Sons. Pages 295–301 (onesample Kolmogorov test), 309–314 (twosample Smirnov test).
William J. Conover (1972), A Kolmogorov GoodnessofFit Test for Discontinuous Distributions. Journal of American Statistical Association, Vol. 67, No. 339, 591–596.
Leon Jay Gleser (1985), Exact Power of GoodnessofFit Tests of Kolmogorov Type for Discontinuous Distributions. Journal of American Statistical Association, Vol. 80, No. 392, 954–958.
Durbin, J. (1973) Distribution theory for tests based on the sample distribution function. SIAM.
George Marsaglia, Wai Wan Tsang and Jingbo Wang (2003), Evaluating Kolmogorov's distribution. Journal of Statistical Software, 8/18. http://www.jstatsoft.org/v08/i18/.
shapiro.test
which performs the ShapiroWilk test for
normality; cvm.test
for Cramervon Mises type tests.
require(graphics)
require(dgof)
set.seed(1)
x < rnorm(50)
y < runif(30)
# Do x and y come from the same distribution?
ks.test(x, y)
# Does x come from a shifted gamma distribution with shape 3 and rate 2?
ks.test(x+2, "pgamma", 3, 2) # twosided, exact
ks.test(x+2, "pgamma", 3, 2, exact = FALSE)
ks.test(x+2, "pgamma", 3, 2, alternative = "gr")
# test if x is stochastically larger than x2
x2 < rnorm(50, 1)
plot(ecdf(x), xlim=range(c(x, x2)))
plot(ecdf(x2), add=TRUE, lty="dashed")
t.test(x, x2, alternative="g")
wilcox.test(x, x2, alternative="g")
ks.test(x, x2, alternative="l")
#########################################################
# TBA, JWE new examples added for discrete distributions:
x3 < sample(1:10, 25, replace=TRUE)
# Using ecdf() to specify a discrete distribution:
ks.test(x3, ecdf(1:10))
# Using step() to specify the same discrete distribution:
myfun < stepfun(1:10, cumsum(c(0, rep(0.1, 10))))
ks.test(x3, myfun)
# The previous R ks.test() does not correctly calculate the
# test statistic for discrete distributions (gives warning):
# stats::ks.test(c(0, 1), ecdf(c(0, 1)))
# ks.test(c(0, 1), ecdf(c(0, 1)))
# Even when the correct test statistic is given, the
# previous R ks.test() gives conservative pvalues:
stats::ks.test(rep(1, 3), ecdf(1:3))
ks.test(rep(1, 3), ecdf(1:3))
ks.test(rep(1, 3), ecdf(1:3), simulate=TRUE, B=10000)