ds_test {dslice}R Documentation

Hypothesis testing via dynamic slicing

Description

Perform a one- or K-sample (K > 1) hypothesis testing via dynamic slicing.

Usage

  ds_test(y, x, ..., type = c("ds", "eqp"), lambda = 1, alpha = 1, rounds = 0)

Arguments

y

A numeric vector of data values.

x

Either an integer vector of data values, from 0 to K-1, or a character string naming a cumulative distribution function or an actual cumulative distribution function such as pnorm. Only continuous CDFs are valid.

...

Parameters of the distribution specified (as a character string) by x.

type

Methods applied for dynamic slicing. "ds" (default) stands for original dynamic slicing scheme. "eqp" stands for dynamic slicing scheme with n^{1/2}-resolution (for K-sample test, K > 1) or n-resolution (for one-sample test).

lambda

Penalty for introducing an additional slice, which is used to avoid making too many slices. It corresponds to the type I error under the scenario that the two variables are independent. lambda should be greater than 0.

alpha

Penalty required for "ds" type in one-sample test. It penalizes both the width and the number of slices to avoid too many slices and degenerate slice (interval). alpha should be greater than 1.

rounds

Number of permutations for estimating empirical p-value.

Details

If x is an integer vector, ds_test performs K-sample test (K > 1).

Under this scenario, suppose that there are observations y drawn from some continuous populations. Let x be a vector that stores values of indicator of samples from different populations, i.e., x has values 0, 1, \ldots, K-1. The null hypothesis is that these populations have the same distribution.

If x is a character string naming a continuous (cumulative) distribution function, ds_test performs one-sample test with the null hypothesis that the distribution function which generated y is distribution x with parameters specified by \ldots. The parameters specified in \ldots must be pre-specified and not estimated from the data.

Only empirical p-values are available by specifying the value of parameter rounds, the number of permutation. lambda and alpha (for one-sample test with type "ds") contributes to p-value.

The procedure of choosing parameter lambda was described in Jiang, Ye & Liu (2015). Refer to dataset ds_type_one_error in this package for the empirical relationship of lambda, sample size and type I error.

Value

A list with class "htest" containing the following components:

statistic

The value of the dynamic slicing statistic.

p.value

The p-value of the test.

alternative

A character string describing the alternative hypothesis.

method

A character string indicating what type of test was performed.

data.name

A character string giving the name(s) of the data.

slices

Slicing strategy that maximize dynamic slicing statistic in K-sample test. Each row stands for a slice. Each column except the last one stands for the number of observations take each value in each slice. The last column is the number of observations in each slice i.e., the sum of the first column to the kth column.

References

Jiang, B., Ye, C. and Liu, J.S. Non-parametric K-sample tests via dynamic slicing. Journal of the American Statistical Association, 110(510): 642-653, 2015.

Examples

##  One-sample test
n <- 100
mu <- 0.5
y <- rnorm(n, mu, 1)
lambda <- 1.0
alpha <- 1.0
dsres <- ds_test(y, "pnorm", 0, 1, lambda = 1, alpha = 1, rounds = 100)
dsres <- ds_test(y, "pnorm", 0, 1, type = "ds", lambda = 1, alpha = 1)
dsres <- ds_test(y, "pnorm", 0, 1, type = "eqp", lambda = 1, rounds = 100)
dsres <- ds_test(y, "pnorm", 0, 1, type = "eqp", lambda = 1)

##  K-sample test
n <- 100
mu <- 0.5
y <- c(rnorm(n, -mu, 1), rnorm(n, mu, 1))

##  generate x in this way:
x <- c(rep(0, n), rep(1, n))
x <- as.integer(x)

##  or in this way:
x <- c(rep("G1", n), rep("G2", n))
x <- relabel(x)

lambda <- 1.0
dsres <- ds_test(y, x, lambda = 1, rounds = 100)
dsres <- ds_test(y, x, type = "eqp", lambda = 1, rounds = 100)

[Package dslice version 1.2.2 Index]