selectData {baytrends} | R Documentation |
Select data for analysis from a larger data frame
Description
Select data for analysis from a larger data frame based on dependent variable, station, and layer. Removing records with missing values, performing log-transformations, and adding a centering date are performed based on settings.
Usage
selectData(
df,
dep,
stat,
layer = NA,
transform = TRUE,
remMiss = TRUE,
analySpec
)
Arguments
df |
data frame |
dep |
dependent variable |
stat |
station |
layer |
layer (optional) |
transform |
logical field to return log-transformed value (TRUE [default]) |
remMiss |
logical field to remove records where dependent variable, dep, is a missing value (TRUE [default]) |
analySpec |
analytical specifications |
Details
The returned data frame will include dyear and cyear. dyear is the decimal year computed using smwrBase::baseDay2decimal and smwrBase::baseDay. From this, the minimum and maximum 'dyear' are averaged. This averaged value, centerYear, is used to compute the centering date, cyear, using cyear = dyear - centerYear.
The variable identified by dep is copied to the variable name dep+".orig" (e.g., chla.orig) allowing the user to track the original concentrations. A new column, recensor, is added. The value of recensor is FALSE unless the value of dep.orig was <=0. In the cases where dep.orig is <= 0, recensor is set to TRUE and the value of dep is set to "less-than" a small positive value which is stored as iSpec$recensor. If transform=TRUE, the returned data frame will also include a variable "ln"+dep (i.e., "lnchla" for log transformed chla).
The data frame will include a column, intervention, which is a factor identifying different periods of record such as when different laboratory methods were used and is based on the data frame methodsList that is loaded into the global environment. This column is set to "A" with only 1 level if the data frame methodsList has not been loaded into the global environment.
The data frame will include a column, lowCensor, to indicate whether the data record occurs in a year with a low level of censoring over that particular year. The function gamTest uses this column to identify years of record (i.e., when lowCensor==FALSE) that should not be used in analyses.
If remMiss=TRUE, then the returned data frame will be down selected by removing records where the variable identified in 'dep' is missing; otherwise, no down selection is performed.
iSpec contains a large list of information
dep - name of column where dependent variable is stored, could be "ln"+dep for variables that will be analyzed after natural log transformation
depOrig - name of original dependent variable, could be same as dep if no transformation is used
stat - name of station
stationMethodGroup - name of station group that the station belongs to, derived from station list (stationMasterList) and used to identify interventions specified in methodsList table
intervenNum - number of interventions found for this station and dependent variable as derived from methodsList table, a value of 1 is assigned if no methodsList entry is found
intervenList - data frame of interventions identified by beginning and ending date and labeled consecutively starting with "A"
layer - layer
layerName - layer name derived from layerLukup
transform - TRUE/FALSE indicating whether log transformations were taken
trendIncrease - an indicator for interpretation of an increasing concentration
logConst - not currently used
recensor - small value that observations <=0 are recensored to as "less than" the small value
censorFrac - data frame indicating the yearly number of observations and fraction of observations reported as less than, uncensored, interval censored, less than zero, and recensored; also includes a 'lowCensor' field indicating which years will be dropped by gamTest due to high yearly censoring
yearRangeDropped - year range of data that will be dropped due to censoring
censorFracSum - censoring overall summary
centerYear - centering year
parmName - parameter name
parmNamelc - parameter name in lower case
parmUnits - parameter units
statLayer - station/layer label, e.g., "LE3.1 (S)"
usgsGageID - USGS gage used for flow adjustments
usgsGageName - USGS gage used for flow adjustments
numObservations - number of observations
dyearBegin - begin date in decimal form
dyearEnd - end date in decimal form
dyearLength - period of record length
yearBegin - period of record begin year
yearend - period of record end year
dateBegin - begin date
dateEnd - end date
The baseDay and baseDay2decimal functions have been added to this package from the smwrBase package.
Value
A nest list is returned. The first element of the nest list is the down-selected data frame. The second element is the list, iSpec, contains specifications for data extraction. See examples for usage and details for further discussion of the data processing and components of each element.
Examples
## Not run:
dfr <- analysisOrganizeData(dataCensored)
# retrieve Secchi depth for Station CB5.4, no transformations are applied
dfr1 <- selectData(dfr[["df"]], 'secchi', 'CB5.4', 'S', transform=FALSE,
remMiss=FALSE, analySpec=dfr[["analySpec"]])
df1 <- dfr1[[1]] # data frame of selected data
iSpec1 <- dfr1[[2]] # meta data about selected data
# retrieve surface corrected chlorophyll-a concentrations for Station CB5.4,
# missing values are removed and transformation applied
dfr2 <- selectData(dfr[["df"]], 'chla', 'CB5.4', 'S', analySpec=dfr[["analySpec"]])
df2 <- dfr2[[1]] # data frame of selected data
iSpec2 <- dfr2[[2]] # meta data about selected data
## End(Not run)