R: Graphical n sample test of correspondence of distribution...

GET.distrequal {GET}

R Documentation

Graphical n sample test of correspondence of distribution functions

Description

Compare the distributions of two (or more) samples.

Usage

GET.distrequal(
  x,
  r = seq(min(unlist((lapply(x, min)))), max(unlist((lapply(x, max)))), length = 100),
  contrasts = FALSE,
  nsim,
  ...
)

Arguments

`x`	A list of numeric vectors, one for each sample.
`r`	The sequence of argument values at which the test functions are to be compared. The default is 100 equally spaced values between the minimum and maximum over all groups.
`contrasts`	Logical. FALSE and TRUE specify the two test functions as described in description part of this help file.
`nsim`	The number of random permutations.
`...`	Additional parameters to be passed to `global_envelope_test`. For example, the type of multiple testing control, FWER or FDR must be set by `typeone`. And, if `typeone = "fwer"`, the type of the global envelope can be chosen by specifying the argument `type`. See `global_envelope_test` for the defaults and available options. (The test here uses `alternative="two.sided"` and `nstep=1` (when relevant), but all the other specifications are to be given in `...`.)

Details

A global envelope test can be performed to investigate whether the n distribution functions differ from each other and how do they differ. This test is a generalization of the two-sample Kolmogorov-Smirnov test with a graphical interpretation. We assume that the observations in the sample i are an i.i.d. sample from the distribution F_i(r), i=1, \dots, n, and we want to test the hypothesis

F_1(r)= \dots = F_n(r).

If contrasts = FALSE (default), then the test statistic is taken to be

\mathbf{T} = (\hat{F}_1(r), \dots, \hat{F}_n(r))

where \hat{F}_i(r) = (\hat{F}_i(r_1), \dots, \hat{F}_i(r_k)) is the ecdf of the ith sample evaluated at argument values r = (r_1,\dots,r_k). This is our recommended test function for the test. Another possibility is given by contrasts = TRUE, and then the test statistic is contructed from all pairwise differences,

\mathbf{T} = (\hat{F}_1(r)-\hat{F}_2(r), \hat{F}_1(r)-\hat{F}_3(r), \dots, \hat{F}_{n-1}(r)-\hat{F}_n(r))

The simulations under the null hypothesis that the distributions are the same are obtained by permuting the individuals of the groups. The default number of permutation, if nsim is not specified, is n*1000 - 1 for the case contrasts = FALSE and (n*(n-1)/2)*1000 - 1 for the case contrasts = TRUE, where n is the length of x.

Examples

if(require(fda, quietly=TRUE)) {
  # Heights of boys and girls at age 10
  f.a <- growth$hgtf["10",] # girls at age 10
  m.a <- growth$hgtm["10",] # boys at age 10
  # Empirical cumulative distribution functions
  plot(ecdf(f.a))
  plot(ecdf(m.a), col='grey70', add=TRUE)
  # Create a list of the data
  fm.list <- list(Girls=f.a, Boys=m.a)
  
  res_m <- GET.distrequal(fm.list)
  plot(res_m)
  res_c <- GET.distrequal(fm.list, contrasts=TRUE)
  plot(res_c)
  
  

  # Heights of boys and girls at age 14
  f.a <- growth$hgtf["14",] # girls at age 14
  m.a <- growth$hgtm["14",] # boys at age 14
  # Empirical cumulative distribution functions
  plot(ecdf(f.a))
  plot(ecdf(m.a), col='grey70', add=TRUE)
  # Create a list of the data
  fm.list <- list(Girls=f.a, Boys=m.a)
  
  res_m <- GET.distrequal(fm.list)
  plot(res_m)
  res_c <- GET.distrequal(fm.list, contrasts=TRUE)
  plot(res_c)
  
  
}

[Package GET version 1.0-2 Index]