PrInDTMulev {PrInDT} | R Documentation |
PrInDT analysis for a classification problem with multiple classes.
Description
PrInDT analysis for a classification problem with more than 2 classes. For each combination of one class vs.
the other classes a 2-class PrInDT
analysis is carried out.
The percentages for undersampling of the larger class ('percl' in PrInDT
) are chosen so that the resulting sizes
are comparable with the size of the smaller classes for which all their observations are used in undersampling ('percs' = 1 in PrInDT
).
The class with the highest probability in the K (= number of classes) analyses is chosen for prediction.
Interpretability is checked (see 'ctestv').
Usage
PrInDTMulev(datain, classname, ctestv=NA, N, conf.level=0.95)
Arguments
datain |
Input data frame with class factor variable 'classname' and the |
classname |
Name of class variable (character) |
ctestv |
Vector of character strings of forbidden split results; |
N |
Number of repetitions (integer > 0) |
conf.level |
(1 - significance level) in function |
Details
Standard output can be produced by means of print(name)
or just name
as well as plot(name)
where 'name' is the output data
frame of the function.
The plot function will produce a series of more than one plot. If you use R, you might want to specify windows(record=TRUE)
before
plot(name)
to save the whole series of plots. In R-Studio this functionality is provided automatically.
Value
- class
levels of class variable
- trees
trees for the levels of the class variable; refer to an individual tree as
trees[[k]]
, k = 1, ..., no. of levels- ba
balanced accuracy of combined predictions
- conf
confusion matrix of combined predictions
- ninterp
no. of non-interpretable trees
Examples
datastrat <- PrInDT::data_zero
data <- na.omit(datastrat)
ctestv <- NA
data$rel[data$ETH %in% c("C1a","C1b","C1c") & data$real == "zero"] <- "zero1"
data$rel[data$ETH %in% c("C2a","C2b","C2c") & data$real == "zero"] <- "zero2"
data$rel[data$real == "realized"] <- "real"
data$rel <- as.factor(data$rel) # rel is new class variable
data$real <- NULL # remove old class variable
N <- 51
conf.level <- 0.99 # 1 - significance level (mincriterion) in ctree
out <- PrInDTMulev(data,"rel",ctestv,N,conf.level)
out # print best models based on subsamples
plot(out) # corresponding plots