| bootstrap.gain {gainML} | R Documentation | 
Construct a Confidence Interval of the Gain Estimate
Description
Estimates gain and its confidence interval at a given level of confidence by using bootstrap.
Usage
bootstrap.gain(df1, df2, df3, opt.cov, n.rep, p1.beg, p1.end, p2.beg,
  p2.end, ratedPW, AEP, pw.freq, freq.id = 3,
  time.format = "%Y-%m-%d %H:%M:%S", k.fold = 5, col.time = 1,
  col.turb = 2, free.sec = NULL, neg.power = FALSE,
  pred.return = FALSE)
Arguments
df1 | 
 A dataframe for reference turbine data. This dataframe must include five columns: timestamp, turbine id, wind direction, power output, and air density.  | 
df2 | 
 A dataframe for baseline control turbine data. This dataframe must include four columns: timestamp, turbine id, wind speed, and power output.  | 
df3 | 
 A dataframe for neutral control turbine data. This dataframe must
include four columns and have the same structure with   | 
opt.cov | 
 A character vector indicating the optimal set of variables (obtained from the period 1 analysis).  | 
n.rep | 
 An integer describing the total number of replications when
applying bootstrap. This number determines the confidence level; for
example, if   | 
p1.beg | 
 A string specifying the beginning date of period 1. By default,
the value needs to be specified in ‘%Y-%m-%d’ format, for example,
  | 
p1.end | 
 A string specifying the end date of period 1. For example, if
the value is   | 
p2.beg | 
 A string specifying the beginning date of period 2.  | 
p2.end | 
 A string specifying the end date of period 2. Defined similarly
as   | 
ratedPW | 
 A kW value that describes the (common) rated power of the selected turbines (REF and CTR-b).  | 
AEP | 
 A kWh value describing the annual energy production from a single turbine.  | 
pw.freq | 
 A matrix or a dataframe that includes power output bins and corresponding frequency in terms of the accumulated hours during an annual period.  | 
freq.id | 
 An integer indicating the column number of   | 
time.format | 
 A string describing the format of time stamps used in the
data to be analyzed. The default value is   | 
k.fold | 
 An integer defining the number of data folds for the period 1
analysis and prediction. In the period 1 analysis,   | 
col.time | 
 An integer specifying the column number of time stamps in wind turbine datasets. The default value is 1.  | 
col.turb | 
 An integer specifying the column number of turbines' id in wind turbine datasets. The default value is 2.  | 
free.sec | 
 A list of vectors defining free sectors. Each vector in the
list has two scalars: one for starting direction and another for ending
direction, ordered clockwise. For example, a vector of   | 
neg.power | 
 Either   | 
pred.return | 
 A logical value whether to return the full prediction
results; see Details below. The default value is   | 
Details
For each replication, this function will make a k of period 1
predictions for each of REF and CTR-b turbine models and an additional
period 2 prediction for each model. This results in 2 \times (k + 1)
predictions for each replication. With n.rep replications, there
will be n.rep \times 2 \times (k + 1) predictions in total.
One can avoid storing such many datasets in the memory by setting
pred.return to FALSE; which is the default setting.
Value
The function returns a list of n.rep replication objects
(lists) each of which includes the following. 
gain.resA list containing gain quantification results; see
quantify.gainfor the details.p1.predA list containing period 1 prediction results.
pred.REF: A list ofkdatasets each representing thekth fold's period 1 prediction for the REF turbine.pred.CTR: A list ofkdatasets each representing thekth fold's period 1 prediction for the CTR-b turbine.
p2.predA list containing period 2 prediction results; see
analyze.p2for the details.
References
H. Hwangbo, Y. Ding, and D. Cabezon, 'Machine Learning Based Analysis and Quantification of Potential Power Gain from Passive Device Installation,' arXiv:1906.05776 [stat.AP], Jun. 2019. https://arxiv.org/abs/1906.05776.
Examples
df.ref <- with(wtg, data.frame(time = time, turb.id = 1, wind.dir = D,
 power = y, air.dens = rho))
df.ctrb <- with(wtg, data.frame(time = time, turb.id = 2, wind.spd = V,
 power = y))
df.ctrn <- df.ctrb
df.ctrn$turb.id <- 3
opt.cov = c('D','density','Vn','hour')
n.rep = 2 # just for illustration; a user may use at leat 10 for this.
res <- bootstrap.gain(df.ref, df.ctrb, df.ctrn, opt.cov = opt.cov, n.rep = n.rep,
 p1.beg = '2014-10-24', p1.end = '2014-10-25', p2.beg = '2014-10-25',
 p2.end = '2014-10-26', ratedPW = 1000, AEP = 300000, pw.freq = pw.freq,
 k.fold = 2)
length(res) #2
sapply(res, function(ls) ls$gain.res$gainCurve) #This provides 2 gain curves.
sapply(res, function(ls) ls$gain.res$gain) #This provides 2 gain values.