rmw_do_all {rmweather} | R Documentation |
Function to train a random forest model to predict (usually) pollutant concentrations using meteorological and time variables and then immediately normalise a variable for "average" meteorological conditions.
Description
rmw_do_all
is a user-level function to conduct the meteorological
normalisation process in one step.
Usage
rmw_do_all(
df,
variables,
variables_sample = NA,
n_trees = 300,
min_node_size = 5,
mtry = NULL,
keep_inbag = TRUE,
n_samples = 300,
replace = TRUE,
se = FALSE,
aggregate = TRUE,
n_cores = NA,
verbose = FALSE
)
Arguments
df |
Input data frame after preparation with
|
variables |
Independent/explanatory variables used to predict
|
variables_sample |
Variables to use for the normalisation step. If not
used, the default of all variables used for training the model with the
exception of |
n_trees |
Number of trees to grow to make up the forest. |
min_node_size |
Minimal node size. |
mtry |
Number of variables to possibly split at in each node. Default is the (rounded down) square root of the number variables. |
keep_inbag |
Should in-bag data be kept in the ranger model
object? This needs to be |
n_samples |
Number of times to sample |
replace |
Should |
se |
Should the standard error of the predictions be calculated too? The standard error method is the "infinitesimal jackknife for bagging" and will slow down the predictions significantly. |
aggregate |
Should all the |
n_cores |
Number of CPU cores to use for the model calculation. Default is system's total minus one. |
verbose |
Should the function give messages? |
Value
Named list.
Author(s)
Stuart K. Grange
See Also
rmw_prepare_data
, rmw_train_model
,
rmw_normalise
Examples
# Load package
library(dplyr)
# Keep things reproducible
set.seed(123)
# Prepare example data
data_london_prepared <- data_london %>%
filter(variable == "no2") %>%
rmw_prepare_data()
# Use the example data to conduct the steps needed for meteorological
# normalisation
list_normalised <- rmw_do_all(
df = data_london_prepared,
variables = c(
"ws", "wd", "air_temp", "rh", "date_unix", "day_julian", "weekday", "hour"
),
n_trees = 300,
n_samples = 300
)