rmw_prepare_data {rmweather} | R Documentation |
Function to prepare a data frame for modelling with rmweather.
Description
rmw_prepare_data
will test and prepare a data frame for further use
with rmweather.
Usage
rmw_prepare_data(
df,
value = "value",
na.rm = FALSE,
replace = FALSE,
fraction = 0.8
)
Arguments
df |
Input data frame. Generally a time series of air quality data with pollutant concentrations and meteorological variables. |
value |
Name of the dependent variable. Usually a pollutant, for example,
|
na.rm |
Should missing values ( |
replace |
When adding the date variables to the set, should they replace the versions already contained in the data frame if they exist? |
fraction |
Fraction of the observations to make up the training set. Default is 0.8, 80 %. |
Details
rmw_prepare_data
will check if a date
variable is present and
is of the correct data type, impute missing numeric and categorical values,
randomly split the input into training and testing sets, and rename the
dependent variable to "value"
. The date
variable will also be
used to calculate new variables such as date_unix
, day_julian
,
weekday
, and hour
which can be used as independent variables.
These attributes are needed for other rmweather functions to operate.
Use set.seed
in an R session to keep results reproducible.
Value
Tibble, the input data transformed ready for modelling with rmweather.
Author(s)
Stuart K. Grange
See Also
set.seed
, rmw_train_model
,
rmw_normalise
Examples
# Load package
library(dplyr)
# Keep things reproducible
set.seed(123)
# Prepare example data for modelling, only use no2 data here
data_london_prepared <- data_london %>%
filter(variable == "no2") %>%
rmw_prepare_data()