PrInDTAllparts {PrInDT} | R Documentation |
Conditional inference trees (ctrees) based on consecutive parts of the full sample
Description
ctrees based on the full sample of the smaller class and consecutive parts of the larger class of the nesting variable 'nesvar'.
The variable 'nesvar' has to be part of the data frame 'datain'.
Interpretability is checked (see 'ctestv'); probability threshold can be specified.
Reference
Weihs, C., Buschfeld, S. 2021b. NesPrInDT: Nested undersampling in PrInDT.
arXiv:2103.14931
Usage
PrInDTAllparts(datain, classname, ctestv=NA, conf.level=0.95, thres=0.5,
nesvar, divt)
Arguments
datain |
Input data frame with class factor variable 'classname' and the |
classname |
Name of class variable (character) |
ctestv |
Vector of character strings of forbidden split results; |
conf.level |
(1 - significance level) in function |
thres |
Probability threshold for prediction of smaller class (numerical, >= 0 and < 1); default = 0.5 |
nesvar |
Name of nesting variable (character) |
divt |
Number of parts of nesting variable nesvar for which models should be determined individually |
Details
Standard output can be produced by means of print(name)
or just name
where 'name' is the output data
frame of the function.
Value
- baAll
balanced accuracy of tree on full sample
- nesvar
name of nesting variable
- divt
number of consecutive parts of the sample
- badiv
balanced accuracy of trees on 'divt' consecutive parts of the sample
Examples
data <- PrInDT::data_speaker
data <- na.omit(data)
nesvar <- "SPEAKER"
outNesAll <- PrInDTAllparts(data,"class",ctestv=NA,conf.level=0.95,thres=0.5,nesvar,divt=8)
outNesAll