R: Calculate monthly summary statistics

calc_monthly_stats {fasstr}

R Documentation

Calculate monthly summary statistics

Description

Calculates means, medians, maximums, minimums, and percentiles for each month of all years of flow values from a daily streamflow data set. Calculates statistics from all values, unless specified. Returns a tibble with statistics.

Usage

calc_monthly_stats(
  data,
  dates = Date,
  values = Value,
  groups = STATION_NUMBER,
  station_number,
  percentiles = c(10, 90),
  roll_days = 1,
  roll_align = "right",
  water_year_start = 1,
  start_year,
  end_year,
  exclude_years,
  months = 1:12,
  transpose = FALSE,
  spread = FALSE,
  complete_years = FALSE,
  ignore_missing = FALSE,
  allowed_missing = ifelse(ignore_missing, 100, 0)
)

Arguments

`data`	Data frame of daily data that contains columns of dates, flow values, and (optional) groups (e.g. station numbers). Leave blank or set to `NULL` if using `station_number` argument.
`dates`	Name of column in `data` that contains dates formatted YYYY-MM-DD. Only required if dates column name is not 'Date' (default). Leave blank or set to `NULL` if using `station_number` argument.
`values`	Name of column in `data` that contains numeric flow values, in units of cubic metres per second. Only required if values column name is not 'Value' (default). Leave blank if using `station_number` argument.
`groups`	Name of column in `data` that contains unique identifiers for different data sets, if applicable. Only required if groups column name is not 'STATION_NUMBER'. Function will automatically group by a column named 'STATION_NUMBER' if present. Remove the 'STATION_NUMBER' column beforehand to remove this grouping. Leave blank if using `station_number` argument.
`station_number`	Character string vector of seven digit Water Survey of Canada station numbers (e.g. `"08NM116"`) of which to extract daily streamflow data from a HYDAT database. Requires `tidyhydat` package and a HYDAT database. Leave blank if using `data` argument.
`percentiles`	Numeric vector of percentiles to calculate. Set to `NA` if none required. Default `c(10,90)`.
`roll_days`	Numeric value of the number of days to apply a rolling mean. Default `1`.
`roll_align`	Character string identifying the direction of the rolling mean from the specified date, either by the first (`'left'`), last (`'right'`), or middle (`'center'`) day of the rolling n-day group of observations. Default `'right'`.
`water_year_start`	Numeric value indicating the month (`1` through `12`) of the start of water year for analysis. Default `1`.
`start_year`	Numeric value of the first year to consider for analysis. Leave blank or set well before start date (i.e. `1800`) to use from the first year of the source data.
`end_year`	Numeric value of the last year to consider for analysis. Leave blank or set well after end date (i.e. `2100`) to use up to the last year of the source data.
`exclude_years`	Numeric vector of years to exclude from analysis. Leave blank or set to `NULL` to include all years.
`months`	Numeric vector of months to include in analysis. For example, `3` for March, `6:8` for Jun-Aug or `c(10:12,1)` for first four months (Oct-Jan) when `water_year_start = 10` (Oct). Default summarizes all months (`1:12`).
`transpose`	Logical value indicating if each month statistic should be individual rows. Default `FALSE`.
`spread`	Logical value indicating if each month statistic should be the column name. Default `FALSE`.
`complete_years`	Logical values indicating whether to include only years with complete data in analysis. Default `FALSE`.
`ignore_missing`	Logical value indicating whether dates with missing values should be included in the calculation. If `TRUE` then a statistic will be calculated regardless of missing dates. If `FALSE` then only those statistics from time periods with no missing dates will be returned. Default `FALSE`.
`allowed_missing`	Numeric value between 0 and 100 indicating the percentage of missing dates allowed to be included to calculate a statistic (0 to 100 percent). If `'ignore_missing = FALSE'` then it defaults to `0` (zero missing dates allowed), if `'ignore_missing = TRUE'` then it defaults to `100` (any missing dates allowed); consistent with `ignore_missing` usage. Supersedes `ignore_missing` when used.

Value

A tibble data frame with the following columns:

`Year`	calendar or water year selected
`Month`	month of the year
`Mean`	mean of all daily flows for a given month and year
`Median`	median of all daily flows for a given month and year
`Maximum`	maximum of all daily flows for a given month and year
`Minimum`	minimum of all daily flows for a given month and year
`P'n'`	each n-th percentile selected for a given month and year

Default percentile columns:

`P10`	10th percentile of all daily flows for a given month and year
`P90`	90th percentile of all daily flows for a given month and year

Transposing data creates a column of 'Statistics' for each month, labeled as 'Month-Statistic' (ex "Jan-Mean"), and subsequent columns for each year selected. Spreading data creates columns of Year and subsequent columns of Month-Statistics (ex 'Jan-Mean').

Examples

# Run if HYDAT database has been downloaded (using tidyhydat::download_hydat())
if (file.exists(tidyhydat::hy_downloaded_db())) {

# Calculate statistics using a data frame and data argument with defaults
flow_data <- tidyhydat::hy_daily_flows(station_number = "08NM116")
calc_monthly_stats(data = flow_data,
                   start_year = 1980)

# Calculate statistics using station_number argument with defaults
calc_monthly_stats(station_number = "08NM116",
                   start_year = 1980)

# Calculate statistics regardless if there is missing data for a given year
calc_monthly_stats(station_number = "08NM116",
                   ignore_missing = TRUE)
                  
# Calculate statistics for water years starting in October
calc_monthly_stats(station_number = "08NM116",
                   start_year = 1980,
                   water_year_start = 10)
                  
# Calculate statistics with custom years and percentiles
calc_monthly_stats(station_number = "08NM116",
                   start_year = 1981,
                   end_year = 2010,
                   exclude_years = c(1991,1993:1995),
                   percentiles = c(25,75))
                   
}

[Package fasstr version 0.5.2 Index]