rpdTest {RPDTest}R Documentation

Randomized phi-divergence test

Description

The most important part of the package: a function for performing hypothesis testing —- An analogue of Chi-square Goodness-of-Fit Test. Accept a vector, matrix or a data.frame as observed data. Then obtain a specific Randomized phi-divergence statistic, which is computed based on a uniformly distributed random vector on the n-sphere. This random vector is uniquely generated at runtime. However, a p-values in Monte Carlo simulation is available as an option. It executes in parallel way, comparing the empirical distribution function. In specific, it simulates data under the null hypothesis and compares it to the observed data. It generates B datasets based on the expected null distribution (p) and the observed control data (v0). For each simulated dataset and the observed data and v0, rs statistics are computed using different random seeds. The Kolmogorov-Smirnov statistic is used to compare the distributions of the simulated and observed data and the simulated and control data. We get B K-S statistics in both observed data group and control data group. The function then calculates a p-value based on how often the within-group mean of the Kolmogorov-Smirnov statistic after dividing the observed data group into z groups is more extreme than the mean of the statistic observed for the control vector group. In the current version (0.0.2), this feature is still being debugged and improved, so this option is not enabled by default.

Usage

rpdTest(
  data,
  p = rep(1/length(data), length(data)),
  lambda = 1,
  sim.pValue = FALSE,
  B = 200,
  z = 40,
  rs = 1250,
  n.cores = NULL,
  random.state = NULL
)

Arguments

data

a one-dimensional vector or matrix of this shape (data.frame) in which observation data for some multinomial distribution are stored.

p

the probability vector in the null hypothesis. Will check the validity of this vector.

lambda

a control parameter of the statistic calculation, adjusting it will significantly change the final obtained statistic.

sim.pValue

a logical variable. It decides whether to compute p-values in Monte Carlo simulation.

B

an integer specifying the number of simulation data on the expected null distribution (p) of the Monte Carlo simulation.

z

an integer specifying the number by which to divide the observation data group in a Monte Carlo simulation.

rs

an integer that adjusts the number of statistics calculated in simulation.

n.cores

an integer used to specify the number of cores used to perform parallel operations. The default is to use the maximum number of cores available to the computer minus one.

random.state

a numeric that controls the randomness of the samples used when generating uniformly distributed random vector on the n-sphere.

Value

standard list object with class "htest".

Examples

d <- rmultinom(1, 120, c(1/4,3/4))
#following will only obtain statistic
rpdTest(d)
#following will obtain sim.p.value either. You can also specify the number of
#cores to use. For example, two:
#It usually takes 1-2 minutes to perform this calculation process

rpdTest(d,sim.pValue = TRUE,n.cores = 2)


[Package RPDTest version 0.0.2 Index]