R: Computes the p-value for a one-sample two-sided...

mixed_ks_test {KSgeneral}

R Documentation

Computes the p-value for a one-sample two-sided Kolmogorov-Smirnov test when the cdf under the null hypothesis is mixed

Description

Computes the p-value P(D_{n} \ge d_{n}), where d_{n} is the value of the KS test statistic computed based on a data sample \{x_{1}, ..., x_{n}\}, when F(x) is mixed, using the Exact-KS-FFT method expressing the p-value as a double-boundary non-crossing probability for a homogeneous Poisson process, which is then efficiently computed using FFT (see Dimitrova, Kaishev, Tan (2020)).

Usage

mixed_ks_test(x, jump_points, Mixed_dist, ..., tol = 1e-10)

Arguments

`x`	a numeric vector of data sample values `\{x_{1}, ..., x_{n}\}`.
`jump_points`	a numeric vector containing the points of (jump) discontinuity, i.e. where the underlying cdf `F(x)` has jump(s)
`Mixed_dist`	a pre-specified (user-defined) mixed cdf, `F(x)`, under the null hypothesis.
`...`	values of the parameters of the cdf, `F(x)` specified (as a character string) by `Mixed_dist`.
`tol`	the value of `\epsilon` that is used to compute the values of `A_{i}` and `B_{i}`, `i = 1, ..., n`, as detailed in Step 1 of Section 2.1 in Dimitrova, Kaishev and Tan (2020) (see also (ii) in the Procedure Exact-KS-FFT therein). By default, `tol = 1e-10`. Note that a value of `NA` or `0` will lead to an error!

Details

Given a random sample \{X_{1}, ..., X_{n}\} of size n with an empirical cdf F_{n}(x), the Kolmogorov-Smirnov goodness-of-fit statistic is defined as D_{n} = \sup | F_{n}(x) - F(x) | , where F(x) is the cdf of a prespecified theoretical distribution under the null hypothesis H_{0}, that \{X_{1}, ..., X_{n}\} comes from F(x).

The function mixed_ks_test implements the Exact-KS-FFT method expressing the p-value as a double-boundary non-crossing probability for a homogeneous Poisson process, which is then efficiently computed using FFT (see Dimitrova, Kaishev, Tan (2020)). This algorithm ensures a total worst-case run-time of order O(n^{2}log(n)).

The function mixed_ks_test computes the p-value P(D_{n} \ge d_{n}), where d_{n} is the value of the KS test statistic computed based on a user-provided data sample \{x_{1}, ..., x_{n}\}, when F(x) is mixed,

We have not been able to identify alternative, fast and accurate, method (software) that has been developed/implemented when the hypothesized F(x) is mixed.

Value

A list with class "htest" containing the following components:

`statistic`	the value of the statistic.
`p.value`	the p-value of the test.
`alternative`	"two-sided".
`data.name`	a character string giving the name of the data.

References

Dimitrina S. Dimitrova, Vladimir K. Kaishev, Senren Tan. (2020) "Computing the Kolmogorov-Smirnov Distribution When the Underlying CDF is Purely Discrete, Mixed or Continuous". Journal of Statistical Software, 95(10): 1-42. doi:10.18637/jss.v095.i10.

Examples

# Example to compute the p-value of the one-sample two-sided KS test,
# when the underlying distribution is a mixed distribution
# with two jumps at 0 and log(2.5),
# as in Example 3.1 of Dimitrova, Kaishev, Tan (2020)

# Defining the mixed distribution

Mixed_cdf_example <- function(x)
{
     result <- 0
     if (x < 0){
         result <- 0
     }
     else if (x == 0){
         result <- 0.5
     }
     else if (x < log(2.5)){
         result <- 1 - 0.5 * exp(-x)
     }
     else{
         result <- 1
     }

     return (result)
}
test_data <- c(0,0,0,0,0,0,0.1,0.2,0.3,0.4,
            0.5,0.6,0.7,0.8,log(2.5),log(2.5),
            log(2.5),log(2.5),log(2.5),log(2.5))
KSgeneral::mixed_ks_test(test_data, c(0, log(2.5)),
                         Mixed_cdf_example)


## Compute the p-value of a two-sided K-S test
## when F(x) follows a zero-and-one-inflated
## beta distribution, as in Example 3.3
## of Dimitrova, Kaishev, Tan (2020)

## The data set is the proportion of inhabitants
## living within a 200 kilometer wide costal strip
## in 232 countries in the year 2010

data("Population_Data")
mu <- 0.6189
phi <- 0.6615
a <- mu * phi
b <- (1 - mu) * phi

Mixed_cdf_example <- function(x)
{
     result <- 0
     if (x < 0){
         result <- 0
     }
     else if (x == 0){
         result <- 0.1141
     }
     else if (x < 1){
         result <- 0.1141 + 0.4795 * pbeta(x, a, b)
     }
     else{
         result <- 1
     }

     return (result)
}
KSgeneral::mixed_ks_test(Population_Data, c(0, 1), Mixed_cdf_example)

[Package KSgeneral version 2.0.2 Index]