| arrange.data {gainML} | R Documentation | 
Split, Merge, and Filter Given Datasets for the Subsequent Analysis
Description
Generates datasets that consist of the measurements from REF, CTR-b, and
CTR-n turbines only. Filters the datasets by eliminating data points with a
missing measurement and those with negative power output (optional).
Generates training and test datasets for k-fold CV and splits the
entire data into period 1 data and period 2 data.
Usage
arrange.data(df1, df2, df3, p1.beg, p1.end, p2.beg, p2.end,
  time.format = "%Y-%m-%d %H:%M:%S", k.fold = 5, col.time = 1,
  col.turb = 2, bootstrap = NULL, free.sec = NULL,
  neg.power = FALSE)
Arguments
df1 | 
 A dataframe for reference turbine data. This dataframe must include five columns: timestamp, turbine id, wind direction, power output, and air density.  | 
df2 | 
 A dataframe for baseline control turbine data. This dataframe must include four columns: timestamp, turbine id, wind speed, and power output.  | 
df3 | 
 A dataframe for neutral control turbine data. This dataframe must
include four columns and have the same structure with   | 
p1.beg | 
 A string specifying the beginning date of period 1. By default,
the value needs to be specified in ‘%Y-%m-%d’ format, for example,
  | 
p1.end | 
 A string specifying the end date of period 1. For example, if
the value is   | 
p2.beg | 
 A string specifying the beginning date of period 2.  | 
p2.end | 
 A string specifying the end date of period 2. Defined similarly
as   | 
time.format | 
 A string describing the format of time stamps used in the
data to be analyzed. The default value is   | 
k.fold | 
 An integer defining the number of data folds for the period 1
analysis and prediction. In the period 1 analysis,   | 
col.time | 
 An integer specifying the column number of time stamps in wind turbine datasets. The default value is 1.  | 
col.turb | 
 An integer specifying the column number of turbines' id in wind turbine datasets. The default value is 2.  | 
bootstrap | 
 An integer indicating the current replication (run) number
of bootstrap. If set to   | 
free.sec | 
 A list of vectors defining free sectors. Each vector in the
list has two scalars: one for starting direction and another for ending
direction, ordered clockwise. For example, a vector of   | 
neg.power | 
 Either   | 
Value
The function returns a list of several datasets including the following.
trainA list containing k datasets that will be used to train the machine learning model.
testA list containing k datasets that will be used to test the machine learning model.
per1A dataframe containing the period 1 data.
per2A dataframe containing the period 2 data.
Examples
df.ref <- with(wtg, data.frame(time = time, turb.id = 1, wind.dir = D, power = y,
 air.dens = rho))
df.ctrb <- with(wtg, data.frame(time = time, turb.id = 2, wind.spd = V, power = y))
df.ctrn <- df.ctrb
df.ctrn$turb.id <- 3
# For Full Sector Analysis
data <- arrange.data(df.ref, df.ctrb, df.ctrn, p1.beg = '2014-10-24', p1.end = '2014-10-27',
 p2.beg = '2014-10-27', p2.end = '2014-10-30')
# For Free Sector Analysis
free.sec <- list(c(310, 50), c(150, 260))
data <- arrange.data(df.ref, df.ctrb, df.ctrn, p1.beg = '2014-10-24', p1.end = '2014-10-27',
 p2.beg = '2014-10-27', p2.end = '2014-10-30', free.sec = free.sec)
length(data$train) #This equals to k.
length(data$test)  #This equals to k.
head(data$per1)    #This shows the beginning of the period 1 dataset.
head(data$per2)    #This shows the beginning of the period 2 dataset.