rashomon_quartet {quartets} | R Documentation |
Rashomon Quartet Data
Description
This dataset contains 2,000 observations, 1,000 training observations and
1,000 testing observations. These were generated such that 4 modeling
techniques (regression tree, linear model, neural network, random forest)
will yield the same and RMSE but will fit the models very differently.
Usage
rashomon_quartet
rashomon_quartet_train
rashomon_quartet_test
Format
rashomon_quartet
: A dataframe with 2000 rows and 5 variables:
-
split
: train, test -
x1
-
x2
-
x3
-
y
rashomon_quartet_train
: A dataframe with 1000 rows and 4 variables:
-
x1
-
x2
-
x3
-
y
rashomon_quartet_test
: A dataframe with 1000 rows and 4 variables:
-
x1
-
x2
-
x3
-
y
Details
There are three explanatory variables x1
, x2
, x3
and one outcome y
generated as:
where and
and
has 1 on the diagonal and 0.9 elsewhere.
If fit using the following hyperparameters, each model will yield an of 0.73 and an RMSE of 0.354
Regression tree: max depth: 3, min split: 250
Linear model: all main effects
Random Forest: mtry: 1, number of trees: 100
Neural network: hidden neurons in each layer: 8, 4, threshold for partial derivatives of the error function as stopping criteria: 0.05
rashomon_quartet_train
contains just the training data and rashomon_quartet_test
contains only the test data.
References
P. Biecek, H. Baniecki, M. Krzyziński, D. Cook. Performance is not enough: the story of Rashomon’s quartet. Preprint arXiv:2302.13356v2, 2023.