arrange.data {gainML} | R Documentation |
Split, Merge, and Filter Given Datasets for the Subsequent Analysis
Description
Generates datasets that consist of the measurements from REF, CTR-b, and
CTR-n turbines only. Filters the datasets by eliminating data points with a
missing measurement and those with negative power output (optional).
Generates training and test datasets for k
-fold CV and splits the
entire data into period 1 data and period 2 data.
Usage
arrange.data(df1, df2, df3, p1.beg, p1.end, p2.beg, p2.end,
time.format = "%Y-%m-%d %H:%M:%S", k.fold = 5, col.time = 1,
col.turb = 2, bootstrap = NULL, free.sec = NULL,
neg.power = FALSE)
Arguments
df1 |
A dataframe for reference turbine data. This dataframe must include five columns: timestamp, turbine id, wind direction, power output, and air density. |
df2 |
A dataframe for baseline control turbine data. This dataframe must include four columns: timestamp, turbine id, wind speed, and power output. |
df3 |
A dataframe for neutral control turbine data. This dataframe must
include four columns and have the same structure with |
p1.beg |
A string specifying the beginning date of period 1. By default,
the value needs to be specified in ‘%Y-%m-%d’ format, for example,
|
p1.end |
A string specifying the end date of period 1. For example, if
the value is |
p2.beg |
A string specifying the beginning date of period 2. |
p2.end |
A string specifying the end date of period 2. Defined similarly
as |
time.format |
A string describing the format of time stamps used in the
data to be analyzed. The default value is |
k.fold |
An integer defining the number of data folds for the period 1
analysis and prediction. In the period 1 analysis, |
col.time |
An integer specifying the column number of time stamps in wind turbine datasets. The default value is 1. |
col.turb |
An integer specifying the column number of turbines' id in wind turbine datasets. The default value is 2. |
bootstrap |
An integer indicating the current replication (run) number
of bootstrap. If set to |
free.sec |
A list of vectors defining free sectors. Each vector in the
list has two scalars: one for starting direction and another for ending
direction, ordered clockwise. For example, a vector of |
neg.power |
Either |
Value
The function returns a list of several datasets including the following.
train
A list containing k datasets that will be used to train the machine learning model.
test
A list containing k datasets that will be used to test the machine learning model.
per1
A dataframe containing the period 1 data.
per2
A dataframe containing the period 2 data.
Examples
df.ref <- with(wtg, data.frame(time = time, turb.id = 1, wind.dir = D, power = y,
air.dens = rho))
df.ctrb <- with(wtg, data.frame(time = time, turb.id = 2, wind.spd = V, power = y))
df.ctrn <- df.ctrb
df.ctrn$turb.id <- 3
# For Full Sector Analysis
data <- arrange.data(df.ref, df.ctrb, df.ctrn, p1.beg = '2014-10-24', p1.end = '2014-10-27',
p2.beg = '2014-10-27', p2.end = '2014-10-30')
# For Free Sector Analysis
free.sec <- list(c(310, 50), c(150, 260))
data <- arrange.data(df.ref, df.ctrb, df.ctrn, p1.beg = '2014-10-24', p1.end = '2014-10-27',
p2.beg = '2014-10-27', p2.end = '2014-10-30', free.sec = free.sec)
length(data$train) #This equals to k.
length(data$test) #This equals to k.
head(data$per1) #This shows the beginning of the period 1 dataset.
head(data$per2) #This shows the beginning of the period 2 dataset.