| caravan.train {agtboost} | R Documentation |
The Insurance Company (TIC) Benchmark
Description
caravan.train and caravan.test both contain a design
matrix with 85 columns and a response vector. The train set consists
of 70% of the data, with 4075 rows. The test set consists of the
remaining 30% with 1747 rows. The following references the documentation
within the ISLR package:
The original data contains 5822 real customer records. Each record
consists of 86 variables, containing sociodemographic data (variables
1-43) and product ownership (variables 44-86). The sociodemographic
data is derived from zip codes. All customers living in areas with the
same zip code have the same sociodemographic attributes. Variable 86
(Purchase) indicates whether the customer purchased a caravan
insurance policy. Further information on the individual variables can
be obtained at http://www.liacs.nl/~putten/library/cc2000/data.html
Usage
caravan.train; caravan.test
Format
Lists with a design matrix x and response y
Source
The data was originally supplied by Sentient Machine Research and was used in the CoIL Challenge 2000.
References
P. van der Putten and M. van Someren (eds) . CoIL Challenge
2000: The Insurance Company Case. Published by Sentient Machine
Research, Amsterdam. Also a Leiden Institute of Advanced Computer
Science Technical Report 2000-09. June 22, 2000. See
http://www.liacs.nl/~putten/library/cc2000/
P. van der Putten and M. van Someren. A Bias-Variance Analysis of a Real World Learning Problem: The CoIL Challenge 2000. Machine Learning, October 2004, vol. 57, iss. 1-2, pp. 177-195, Kluwer Academic Publishers
James, G., Witten, D., Hastie, T., and Tibshirani, R. (2013)
An Introduction to Statistical Learning with applications in R,
https://trevorhastie.github.io/ISLR/,
Springer-Verlag, New York
Examples
summary(caravan.train)
summary(caravan.test)