DDSE {capushe}R Documentation

Model selection by Data-Driven Slope Estimation

Description

DDSE is a model selection function based on the slope heuristics.

Usage

DDSE(data, pct = 0.15, point = 0, psi.rlm = psi.bisquare, scoef = 2)

Arguments

data

data is a matrix or a data.frame with four columns of the same length and each line corresponds to a model:

  1. The first column contains the model names.

  2. The second column contains the penalty shape values.

  3. The third column contains the model complexity values.

  4. The fourth column contains the minimum contrast value for each model.

pct

Minimum percentage of points for the plateau selection. It must be between 0 and 1. Default value is 0.15.

point

Minimum number of point for the plateau selection. If point is different from 0, pct is obsolete.

psi.rlm

Weight function used by rlm. psi.rlm="lm" for non robust linear regression.

scoef

Ratio parameter. Default value is 2.

Details

Let M be the model collection and P=\{pen_{shape}(m),m\in M\}. The DDSE algorithm proceeds in four steps:

  1. If several models in the collection have the same penalty shape value (column 2), only the model having the smallest contrast value \gamma_n(\hat{s}_m) (column 4) is considered.

  2. For any p\in P, the slope \hat{\kappa}(p) (argument @kappa) of the linear regression (argument psi.rlm) on the couples of points \{(pen_{shape}(m),-\gamma_n (\hat{s}_m)); pen_{shape}(m)\geq p\} is computed.

  3. For any p\in P, the model fulfilling the following condition is selected:

    \hat{m}(p)= argmin \gamma_n (\hat{s}_m)+scoef\times \hat{\kappa}(p)\times pen_{shape}(m).

    This gives an increasing sequence of change-points (p_i)_{1\leq i\leq I+1} (output @ModelHat$point_breaking). Let (N_i)_{1\leq i\leq I} (output @ModelHat$number_plateau) be the lengths of each "plateau".

  4. If point is different from 0, let \hat{i}= max \{1\leq i\leq I; N_i\geq point\} else let \hat{i}= max \{1\leq i\leq I; N_i\geq pct\sum_{l=1}^IN_l\} (output @ModelHat$imax). The model \hat{m}(p_{\hat{i}}) (output @model) is finally returned.

The "slope interval" is the interval [a,b] where a=inf\{\hat{\kappa}(p),p\in[p_{\hat{i}},p_{\hat{i}+1}[\cap P\} and b=sup\{\hat{\kappa}(p),p\in[p_{\hat{i}},p_{\hat{i}+1}[\cap P\}.

Value

@model

The model selected by the DDSE algorithm.

@kappa

The vector of the successive slope values.

@ModelHat

A list describing the algorithm.

@ModelHat$model_hat

The vector of preselected models \hat{m}(p).

@ModelHat$point_breaking

The vector of the breaking points (p_i)_{1\leq i\leq I+1}.

@ModelHat$number_plateau

The vector of the lengths (N_i)_{1\leq i\leq I}.

@ModelHat$imax

The rank \hat{i} of the selected plateau.

@interval

A list about the "slope interval".

@interval$interval

The slope interval.

@interval$percent_of_points

The proportion N_{\hat{i}}/\sum_{l=1}^IN_l.

@graph

A list computed for the plot method.

Author(s)

Vincent Brault

References

http://www.math.univ-toulouse.fr/~maugis/CAPUSHE.html

http://www.math.u-psud.fr/~brault/capushe.html

Article: Baudry, J.-P., Maugis, C. and Michel, B. (2011) Slope heuristics: overview and implementation. Statistics and Computing, to appear. doi: 10.1007/ s11222-011-9236-1

See Also

capushe for a model selection function including AIC, BIC, the DDSE algorithm and the Djump algorithm. plot for graphical dsiplays of the DDSE algorithm and the Djump algorithm.

Examples

data(datacapushe)
DDSE(datacapushe)
plot(DDSE(datacapushe))
## DDSE with "lm" for the regression
DDSE(datacapushe,psi.rlm="lm")

[Package capushe version 1.1.2 Index]