SydColDat {eglhmm} | R Documentation |
Sydney coliform bacteria data
Description
Transformed counts of faecal coliform bacteria in sea water at seven locations: Longreef, Bondi East, Port Hacking “50”, and Port Hacking “100” (controls) and Bondi Offshore, Malabar Offshore and North Head Offshore (outfalls). At each location measurements were made at four depths: 0, 20, 40, and 60 meters.
The data sets are named SydColCount
and SydColDisc
.
Format
Data frames with 5432 observations on the following 6 variables.
y
Transformed measures of the number of faecal coliform count bacteria in a sea-water sample of some specified volume. The original measures were obtained by a repeated dilution process.
For
SydColCount
the transformation used was essentially a square root transformation, resulting values greater than 150 being set toNA
. The results are putatively compatible with a Poisson model for the emission probabilities.For
SydColDisc
the data were discretised using thecut()
function with breaks given byc(0,1,5,25,200,Inf)
and labels equal toc("lo","mlo","m","mhi","hi")
.
Note that in the SydColDisc
data there are 180 fewer
missing values (NA
s) in the y
column than in
the SydColCount
data. This is because in forming
the SydColCount
data (transforming the original data
to a putative Poisson distribution) values that were greater
than 150 were set equal to NA
, and there were 180 such
values.
locn
a factor with levels “LngRf” (Longreef), “BondiE” (Bondi East), “PH50” (Port Hacking 50), “PH100” (Port Hacking 100), “BondiOff” (Bondi Offshore), “MlbrOff” (Malabar Offshore) and “NthHdOff” (North Head Offshore)
depth
a factor with levels “0” (0 metres), “20” (20 metres), “40” (40 metres) and “60” (60 metres).
ma.com
A factor with levels
no
andyes
, indicating whether the Malabar sewage outfall had been commissioned.nh.com
A factor with levels
no
andyes
, indicating whether the North Head sewage outfall had been commissioned.bo.com
A factor with levels
no
andyes
, indicating whether the Bondi Offshore sewage outfall had been commissioned.
Details
The observations corresponding to each location-depth combination constitute a time series. The sampling interval is ostensibly 1 week; distinct time series are ostensibly synchronous. The measurements were made over a 194 week period. See Turner et al. (1998) for more detail.
Source
Geoff Coade, of the New South Wales Environment Protection Authority (Australia)
References
T. Rolf Turner, Murray A. Cameron, and Peter J. Thomson. Hidden Markov chains in generalized linear models. Canadian J. Statist., vol. 26, pp. 107 – 125, 1998.
Rolf Turner. Direct maximization of the likelihood of a hidden Markov model. Computational Statistics and Data Analysis 52, pp. 4147 – 4160, 2008, doi:10.1016/j.csda.2008.01.029.
Examples
# Select out a subset of four locations:
loc4 <- c("LngRf","BondiE","BondiOff","MlbrOff")
SCC4 <- SydColCount[SydColCount$locn %in% loc4,]
SCC4$locn <- factor(SCC4$locn) # Get rid of unused levels.
rownames(SCC4) <- 1:nrow(SCC4)