readingSkills {party} | R Documentation |
Reading Skills
Description
A toy data set illustrating the spurious correlation between reading skills and shoe size in school-children.
Usage
data("readingSkills")
Format
A data frame with 200 observations on the following 4 variables.
nativeSpeaker
a factor with levels
no
andyes
, whereyes
indicates that the child is a native speaker of the language of the reading test.age
age of the child in years.
shoeSize
shoe size of the child in cm.
score
raw score on the reading test.
Details
In this artificial data set, that was generated by means of a linear model,
age
and nativeSpeaker
are actual predictors of the
score
, while the spurious correlation between score
and
shoeSize
is merely caused by the fact that both depend on age
.
The true predictors can be identified, e.g., by means of partial correlations, standardized beta coefficients in linear models or the conditional random forest variable importance, but not by means of the standard random forest variable importance (see example).
Examples
set.seed(290875)
readingSkills.cf <- cforest(score ~ ., data = readingSkills,
control = cforest_unbiased(mtry = 2, ntree = 50))
# standard importance
varimp(readingSkills.cf)
# the same modulo random variation
varimp(readingSkills.cf, pre1.0_0 = TRUE)
# conditional importance, may take a while...
varimp(readingSkills.cf, conditional = TRUE)