data_reg {etree} | R Documentation |
Regression toy dataset
Description
A simple dataset containing simulated values for a numeric response variable and four covariates of both mixed and partially structured type. The data generation process is based on Section 5 (”Example: synthetic data”) from Serban and Wasserman (2005).
Usage
data_reg
Format
List with two elements: covs
, which is a list containing the
covariates, and resp
, which is a numeric vector of length 200
representing the response variable. The response variable is specified as
in Serban and Wasserman (2005). The four covariates in covs
all have
length 200 and are characterized as follows:
Nominal: level 0 for observations having negative response variable, level 1 otherwise;
Numeric: coefficients for one of the basis used to perform the B-splines expansion of the curves that are in turn specified as in Serban and Wasserman (2005);
Functional: curves as specified in Serban and Wasserman (2005), with 50 observations coming from each of the four curve shapes;
Graphs: Erd\"os-R\'enyi graphs with connection probability given by a transformation of the response variable obtained standardizing between 0.2 and 0.8 its value after adding a normally distributed noise with mean 0 and standard deviation 7.
References
Serban, N., and Wasserman, L. (2005). CATS: clustering after transformation and smoothing. Journal of the American Statistical Association, 100(471), 990-999.