dis.kstest {iZID}R Documentation

The Monte Carlo estimate for the p-value of a discrete KS Test

Description

Computes the Monte Carlo estimate for the p-value of a discrete one-sample Kolmogorov-Smirnov (KS) Test for Poisson, negative binomial, beta binomial, beta negative binomial distributions and their zero-inflated as well as hurdle versions.

Usage

dis.kstest(x, nsim = 100, bootstrap = TRUE, distri = "Poisson",
  r = NULL, p = NULL, alpha1 = NULL, alpha2 = NULL, n = NULL,
  lowerbound = 0.01, upperbound = 10000, parallel = FALSE)

Arguments

x

A vector of count data. Should be non-negative integers. If elements of x are not integers, they will be automatically rounded up to the smallest integers that are no less than themselves.

nsim

The number of bootstrapped samples or simulated samples generated to compute p-value. If it is not an integer, nsim will be automatically rounded up to the smallest integer that is no less than nsim. Should be greater than 30. Default is 100.

bootstrap

Whether to generate bootstrapped samples or not. See Details. 'TRUE' or any numeric non-zero value indicates the generation of bootstrapped samples. The default is 'TRUE'.

distri

The distribution used as the null hypothesis. Can be one of {'poisson','nb','bb', 'bnb','zip','zinb','zibb', zibnb','ph','nbh','bbh','bnbh'}, which corresponds to Poisson, negative binomial, beta binomial and beta negative binomial distributions and their zero-inflated as well as hurdle versions, respectively. Default is 'Poisson'.

r

An initial value of the number of success before which m failures are observed, where m is the element of x. Must be a positive number, but not required to be an integer.

p

An initial value of the probability of success, should be a positive value within (0,1).

alpha1

An initial value for the first shape parameter of beta distribution. Should be a positive number.

alpha2

An initial value for the second shape parameter of beta distribution. Should be a positive number.

n

An initial value of the number of trials. Must be a positive number, but not required to be an integer.

lowerbound

A lower searching bound used in the optimization of likelihood function. Should be a small positive number. The default is 1e-2.

upperbound

An upper searching bound used in the optimization of likelihood function. Should be a large positive number. The default is 1e4.

parallel

whether to use multiple threads to parallelize computation. Default is FALSE. Please aware that it may take longer time to execute the program with parallel=FALSE.

Details

For arguments nsim, bootstrap, distri, if the length is larger than 1, only the first element will be used. For other arguments except for x, the first valid value will be used if the input is not NULL, otherwise some naive sample estimates will be fed into the algorithm. Note that only the initial values that are occurred in the null distribution distri are needed. For example, with distri=poisson, user may provide a value for lambda but not for r or p, though it won't disturb the algorithm.

With an output p-value less than some user-specified significance level, x is very likely from a distribution other than the distri, given the current data. If p-values of more than one distributions are greater than the pre-specified significance level, user may consider a following likelihood ratio test to select a 'better' distribution.

The methodology of computing Monte Carlo p-value is taken from Aldirawi et al. (2019). With bootstrap=TRUE, nsim bootstrapped samples will be generated by resampling x without replacement. Otherwise, nsim samples are simulated from the null distribution with the maximum likelihood estimate of original data x. Then compute the maximum likelihood estimates of nsim bootstrapped or simulated samples, based on which nsim new samples are generated under the null distribution. nsim KS statistics are calculated for the nsim new samples, then the Monte Carlo p-value is resulted from comparing the nsim KS statistics and the statistic of original data x.

During the process of computing maximum likelihood estimates, the negative log likelihood function is minimized via basic R function optim with the searching interval decided by lowerbound and upperbound, except that the optimization of p takes 1-lowerbound as the upper searching bound.

To accelerate the whole process, the algorithm uses the parallel strategy via the packages foreach and doParallel.

Value

An object of class 'dis.kstest' including the following elements:

Reference

See Also

model.lrt

Examples


set.seed(2001)
temp1=sample.zi(N=300,phi=0.3,distri='poisson',lambda=5)
dis.kstest(temp1,nsim=100,bootstrap=TRUE,distri='Poisson')$pvalue
dis.kstest(temp1,nsim=100,bootstrap=TRUE,distri='nb')$pvalue
dis.kstest(temp1,nsim=100,bootstrap=TRUE,distri='zip')$pvalue
dis.kstest(temp1,nsim=100,bootstrap=TRUE,distri='zinb')$pvalue


[Package iZID version 0.0.1 Index]