gofGroupTest {EnvStats} R Documentation

## Goodness-of-Fit Test for a Specified Probability Distribution for Groups

### Description

Perform a goodness-of-fit test to determine whether data in a set of groups appear to all come from the same probability distribution (with possibly different parameters for each group).

### Usage

gofGroupTest(object, ...)

## S3 method for class 'formula'
gofGroupTest(object, data = NULL, subset,
na.action = na.pass, ...)

## Default S3 method:
gofGroupTest(object, group, test = "sw",
distribution = "norm", est.arg.list = NULL, n.classes = NULL,
cut.points = NULL, param.list = NULL,
estimate.params = ifelse(is.null(param.list), TRUE, FALSE),

### Details

The function gofGroupTest performs a goodness-of-fit test for each group of data by calling the function gofTest. Using the p-values from these goodness-of-fit tests, it then calls the function gofTest with the argument test="ws" to test whether the p-values appear to come from a Uniform [0,1] distribution.

### Value

a list of class "gofGroup" containing the results of the group goodness-of-fit test. Objects of class "gofGroup" have special printing and plotting methods. See the help file for gofGroup.object for details.

### Note

The Wilk-Shapiro (1968) tests for a Uniform [0, 1] distribution were introduced in the context of testing whether several independent samples all come from normal distributions, with possibly different means and variances. The function gofGroupTest extends this idea to allow you to test whether several independent samples come from the same distribution (e.g., gamma, extreme value, etc.), with possibly different parameters.

Examples of simultaneously assessing whether several groups come from the same distribution are given in USEPA (2009) and Gibbons et al. (2009).

In practice, almost any goodness-of-fit test will not reject the null hypothesis if the number of observations is relatively small. Conversely, almost any goodness-of-fit test will reject the null hypothesis if the number of observations is very large, since “real” data are never distributed according to any theoretical distribution (Conover, 1980, p.367). For most cases, however, the distribution of “real” data is close enough to some theoretical distribution that fairly accurate results may be provided by assuming that particular theoretical distribution. One way to asses the goodness of the fit is to use goodness-of-fit tests. Another way is to look at quantile-quantile (Q-Q) plots (see qqPlot).

### Author(s)

Steven P. Millard (EnvStats@ProbStatInfo.com)

### References

Gibbons, R.D., D.K. Bhaumik, and S. Aryal. (2009). Statistical Methods for Groundwater Monitoring, Second Edition. John Wiley & Sons, Hoboken.

USEPA. (2009). Statistical Analysis of Groundwater Monitoring Data at RCRA Facilities, Unified Guidance. EPA 530/R-09-007, March 2009. Office of Resource Conservation and Recovery Program Implementation and Information Division. U.S. Environmental Protection Agency, Washington, D.C. p.17-17.

USEPA. (2010). Errata Sheet - March 2009 Unified Guidance. EPA 530/R-09-007a, August 9, 2010. Office of Resource Conservation and Recovery, Program Information and Implementation Division. U.S. Environmental Protection Agency, Washington, D.C.

Wilk, M.B., and S.S. Shapiro. (1968). The Joint Assessment of Normality of Several Independent Samples. Technometrics, 10(4), 825-839.

gofTest, gofGroup.object, print.gofGroup, plot.gofGroup, qqPlot.

### Examples

  # Example 10-4 of USEPA (2009, page 10-20) gives an example of
# simultaneously testing the assumption of normality for nickel
# concentrations (ppb) in groundwater collected at 4 monitoring
# wells over 5 months.  The data for this example are stored in
# EPA.09.Ex.10.1.nickel.df.

EPA.09.Ex.10.1.nickel.df
#   Month   Well Nickel.ppb
#1      1 Well.1       58.8
#2      3 Well.1        1.0
#3      6 Well.1      262.0
#4      8 Well.1       56.0
#5     10 Well.1        8.7
#6      1 Well.2       19.0
#7      3 Well.2       81.5
#8      6 Well.2      331.0
#9      8 Well.2       14.0
#10    10 Well.2       64.4
#11     1 Well.3       39.0
#12     3 Well.3      151.0
#13     6 Well.3       27.0
#14     8 Well.3       21.4
#15    10 Well.3      578.0
#16     1 Well.4        3.1
#17     3 Well.4      942.0
#18     6 Well.4       85.6
#19     8 Well.4       10.0
#20    10 Well.4      637.0

# Test for a normal distribution at each well:
#--------------------------------------------

gofGroup.list <- gofGroupTest(Nickel.ppb ~ Well,
data = EPA.09.Ex.10.1.nickel.df)

gofGroup.list

#Results of Group Goodness-of-Fit Test
#-------------------------------------
#
#Test Method:                     Wilk-Shapiro GOF (Normal Scores)
#
#Hypothesized Distribution:       Normal
#
#Data:                            Nickel.ppb
#
#Grouping Variable:               Well
#
#Data Source:                     EPA.09.Ex.10.1.nickel.df
#
#Number of Groups:                4
#
#Sample Sizes:                    Well.1 = 5
#                                 Well.2 = 5
#                                 Well.3 = 5
#                                 Well.4 = 5
#
#Test Statistic:                  z (G) = -3.658696
#
#P-values for
#Individual Tests:                Well.1 = 0.03510747
#                                 Well.2 = 0.02385344
#                                 Well.3 = 0.01120775
#                                 Well.4 = 0.10681461
#
#P-value for
#Group Test:                      0.0001267509
#
#Alternative Hypothesis:          At least one group
#                                 does not come from a
#                                 Normal Distribution.

dev.new()
plot(gofGroup.list)

#----------

# Test for a lognormal distribution at each well:
#-----------------------------------------------

gofGroupTest(Nickel.ppb ~ Well, data = EPA.09.Ex.10.1.nickel.df,
dist = "lnorm")

#Results of Group Goodness-of-Fit Test
#-------------------------------------
#
#Test Method:                     Wilk-Shapiro GOF (Normal Scores)
#
#Hypothesized Distribution:       Lognormal
#
#Data:                            Nickel.ppb
#
#Grouping Variable:               Well
#
#Data Source:                     EPA.09.Ex.10.1.nickel.df
#
#Number of Groups:                4
#
#Sample Sizes:                    Well.1 = 5
#                                 Well.2 = 5
#                                 Well.3 = 5
#                                 Well.4 = 5
#
#Test Statistic:                  z (G) = 0.2401720
#
#P-values for
#Individual Tests:                Well.1 = 0.6898164
#                                 Well.2 = 0.6700394
#                                 Well.3 = 0.3208299
#                                 Well.4 = 0.5041375
#
#P-value for
#Group Test:                      0.5949015
#
#Alternative Hypothesis:          At least one group
#                                 does not come from a
#                                 Lognormal Distribution.

#----------
# Clean up
rm(gofGroup.list)
graphics.off()


[Package EnvStats version 2.8.1 Index]