analyze.p1 {gainML} | R Documentation |
Apply Period 1 Analysis
Description
Conducts period 1 analysis; selects the optimal set of variables that minimizes a k-fold CV error measure and establishes a machine learning model that predicts power output of REF and CTR-b turbines by using period 1 data.
Usage
analyze.p1(train, test, ratedPW)
Arguments
train |
A list containing k datasets that will be used to train the machine learning model. |
test |
A list containing k datasets that will be used to test the machine learning model and calculate CV error measures. |
ratedPW |
A kW value that describes the (common) rated power of the selected turbines (REF and CTR-b). |
Value
The function returns a list containing period 1 analysis results as follows.
opt.cov
A character vector presenting the names of predictor variables chosen for the optimal set.
pred.REF
A list of
k
datasets each representing thek
th fold's period 1 prediction for the REF turbine.pred.CTR
A list of
k
datasets each representing thek
th fold's period 1 prediction for the CTR-b turbine.err.REF
A data frame containing
k
-fold CV based RMSE values and BIAS values for the REF turbine model (sok
of them for both). The first column includes the RMSE values and the second column includes the BIAS values.err.CTR
A data frame containing
k
-fold CV based RMSE values and BIAS values for the CTR-b turbine model. Similarly structured witherr.REF
.biasCurve.REF
A
k
bym
matrix describing the binned BIAS (technically speacking, ‘residuals’ which are the negative BIAS) curve for the REF turbine model, wherem
is the number of power bins.biasCurve.CTR
A
k
bym
matrix describing the binned BIAS curve for the CTR-b turbine model.
Note
VERY IMPORTANT!
Selecting the optimal set of variables will take a significant amount of time. For example, with a typical size of an annual dataset, the evaluation of one set of variables for a single fold testing may take about 20-40 minutes (from the authors' experience).
To help understand the progress of the selection, some informative messages will be displayed while this function runs.
References
H. Hwangbo, Y. Ding, and D. Cabezon, 'Machine Learning Based Analysis and Quantification of Potential Power Gain from Passive Device Installation,' arXiv:1906.05776 [stat.AP], Jun. 2019. https://arxiv.org/abs/1906.05776.
Examples
df.ref <- with(wtg, data.frame(time = time, turb.id = 1, wind.dir = D,
power = y, air.dens = rho))
df.ctrb <- with(wtg, data.frame(time = time, turb.id = 2, wind.spd = V,
power = y))
df.ctrn <- df.ctrb
df.ctrn$turb.id <- 3
data <- arrange.data(df.ref, df.ctrb, df.ctrn, p1.beg = '2014-10-24',
p1.end = '2014-10-25', p2.beg = '2014-10-25', p2.end = '2014-10-26',
k.fold = 2)
p1.res <- analyze.p1(data$train, data$test, ratedPW = 1000)
p1.res$opt.cov #This provides the optimal set of variables.