SydColDat {eglhmm}R Documentation

Sydney coliform bacteria data

Description

Transformed counts of faecal coliform bacteria in sea water at seven locations: Longreef, Bondi East, Port Hacking “50”, and Port Hacking “100” (controls) and Bondi Offshore, Malabar Offshore and North Head Offshore (outfalls). At each location measurements were made at four depths: 0, 20, 40, and 60 meters.

The data sets are named SydColCount and SydColDisc.

Format

Data frames with 5432 observations on the following 6 variables.

y

Transformed measures of the number of faecal coliform count bacteria in a sea-water sample of some specified volume. The original measures were obtained by a repeated dilution process.

For SydColCount the transformation used was essentially a square root transformation, resulting values greater than 150 being set to NA. The results are putatively compatible with a Poisson model for the emission probabilities.

For SydColDisc the data were discretised using the cut() function with breaks given by c(0,1,5,25,200,Inf) and labels equal to c("lo","mlo","m","mhi","hi").

Note that in the SydColDisc data there are 180 fewer missing values (NAs) in the y column than in the SydColCount data. This is because in forming the SydColCount data (transforming the original data to a putative Poisson distribution) values that were greater than 150 were set equal to NA, and there were 180 such values.

locn

a factor with levels “LngRf” (Longreef), “BondiE” (Bondi East), “PH50” (Port Hacking 50), “PH100” (Port Hacking 100), “BondiOff” (Bondi Offshore), “MlbrOff” (Malabar Offshore) and “NthHdOff” (North Head Offshore)

depth

a factor with levels “0” (0 metres), “20” (20 metres), “40” (40 metres) and “60” (60 metres).

ma.com

A factor with levels no and yes, indicating whether the Malabar sewage outfall had been commissioned.

nh.com

A factor with levels no and yes, indicating whether the North Head sewage outfall had been commissioned.

bo.com

A factor with levels no and yes, indicating whether the Bondi Offshore sewage outfall had been commissioned.

Details

The observations corresponding to each location-depth combination constitute a time series. The sampling interval is ostensibly 1 week; distinct time series are ostensibly synchronous. The measurements were made over a 194 week period. See Turner et al. (1998) for more detail.

Source

Geoff Coade, of the New South Wales Environment Protection Authority (Australia)

References

T. Rolf Turner, Murray A. Cameron, and Peter J. Thomson. Hidden Markov chains in generalized linear models. Canadian J. Statist., vol. 26, pp. 107 – 125, 1998.

Rolf Turner. Direct maximization of the likelihood of a hidden Markov model. Computational Statistics and Data Analysis 52, pp. 4147 – 4160, 2008, doi:10.1016/j.csda.2008.01.029.

Examples

# Select out a subset of four locations:
loc4 <- c("LngRf","BondiE","BondiOff","MlbrOff")
SCC4 <- SydColCount[SydColCount$locn %in% loc4,] 
SCC4$locn <- factor(SCC4$locn) # Get rid of unused levels.
rownames(SCC4) <- 1:nrow(SCC4)

[Package eglhmm version 0.1-3 Index]