enetLTS {enetLTS}  R Documentation 
Compute fully robust versions of the elastic net estimator, which allows for sparse model estimates, for linear regression and binary and multinomial logistic regression.
enetLTS(
xx,
yy,
family=c("gaussian","binomial","multinomial"),
alphas=seq(0,1,length=41),
lambdas=NULL,
lambdaw=NULL,
intercept=TRUE,
scal=TRUE,
hsize=0.75,
nsamp=c(500,10),
nCsteps=20,
nfold=5,
repl=1,
ncores=1,
tol=1e6,
seed=NULL,
del=0.0125,
crit.plot=FALSE,
typegrouped=FALSE,
type.response=c("link","response","class")
)
xx 
a numeric matrix containing the predictor variables. 
yy 
response variable. Quantitative for 
family 
a description of the error distribution and link function to be used
in the model. 
alphas 
a user supplied alpha sequence for the elastic net penalty, which is
the mixing proportion of the ridge and lasso penalties and takes value in [0,1].

lambdas 
a user supplied lambda sequence for the strength of the elastic net penalty.
If not provided a sequence, default is chosen with steps of size 0.025 lambda0 with

lambdaw 
a user supplied lambda sequence for reweighting step. If not provided,
default is computed by using kfold crossvalidation via 
intercept 
a logical indicating whether a constant term should be
included in the model (the default is 
scal 
a logical value indicating whether scale the predictors by their arithmetic means
and standard deviations. For 
hsize 
a user supplied numeric value giving the percentage of the residuals for which the elastic net penalized sum of squares for linear regression or for which the elastic net penalized sum of deviances for binary and multinomial logistic regression should be minimized. The default is 0.75. 
nsamp 
a numeric vector giving the number of subsamples to be used in
the beginning of the algorithm, which gives the number of
initial subsamples to be used. The default is to first perform Csteps on 500
initial subsamples, and then to keep the 
nCsteps 
a positive integer giving the number of Csteps to perform on determined s1 subsamples. The default is 20. 
nfold 
a user supplied numeric value for fold number of kfold crossvalidation which used in varied functions of the algorithm. The default is 5fold crossvalidation. 
repl 
a user supplied positive number for more stable results, repeat the kfold CV

ncores 
a positive integer giving the number of processor cores to be
used for parallel computing (the default is 1 for no parallelization). If
this is set to 
tol 
a small numeric value for convergence. The default is 1e6. 
seed 
optional initial seed for the random number generator (see 
del 
The default is 0.0125. 
crit.plot 
a logical value indicating if produces a plot for kfold crossvalidation based on alpha and lambda combinations. The default is TRUE. 
typegrouped 
This argument is available for only 
type.response 
type of prediction required. 
The idea of repeatedly applying the nonrobust classical elastic net estimators to data subsets
only is used for linear and logistic regression. The algorithm starts with 500 elemental subsets
only for one combination of \alpha
and \lambda
, and takes the warm start strategy
for subsequent combinations. This idea saves the computation time.
To choose the elastic net penalties, kfold crossvalidation is used and the replication option is
provided for more stable results.
Robustness has been achieved by using trimming idea, therefore a reweighting step is introduced
in order to improve the efficiency. The outliers are identified according to current model.
For family="gaussian"
, standardized residuals are used. For family="binomial"
, the Pearson
residuals which are approximately standard normally distributed is used. Then the weights are defined by
the binary weight function using del=0.0125
, which allows to be flagged as outliers of the
2.5% of the observations in the normal model. For family="multinomial"
,
groupwise scaled robust distances are used. The the binary weights defined using the constant $c_2=5$.
Therefore, binary weight function produces a clear distinction between the "good observations" and "outliers".
objective 
a numeric vector giving the respective values of the
enetLTS objective function, i.e., the elastic net penalized sums of
the 
raw.rmse 
root mean squared error for raw fit, which is available for only

rmse 
root mean squared error for reweighted fit, which is available for only

raw.mae 
mean absolute error for raw fit. 
mae 
mean absolute error for reweighted fit. 
best 
an integer vector containing the respective best
subsets of 
raw.wt 
an integer vector containing binary weights that indicate outliers from the respective raw fits, i.e., the weights used for the reweighted fits. 
wt 
an integer vector containing binary weights that
indicate outliers from the respective reweighted fits, i.e., the weights are

raw.coefficients 
a numeric vector containing the respective coefficient estimates from the raw fit. 
coefficients 
a numeric vector containing the respective coefficient estimates from the reweighted fit. 
raw.fitted.values 
a numeric vector containing the respective fitted values of the response from the raw fits. 
fitted.values 
a numeric vector containing the respective fitted values of the response from the reweighted fits. 
raw.residuals 
a numeric vector containing the
respective residuals for 
residuals 
a numeric vector containing the
respective residuals for 
alpha 
an optimal elastic net mixing parameter value obtained with kfold crossvalidation. 
lambda 
an optimal value for the strength of the elastic net penalty obtained with kfold crossvalidation. 
lambdaw 
an optimal value for the strength of the elastic net penalty reobtained with kfold crossvalidation for reweighted fit. 
num.nonzerocoef 
the number of the nonzero coefficients in the model. 
n 
the number of observations. 
p 
the number of variables. 
h 
the number of observations used to compute the raw estimates. 
classnames 
class names for logistic model, which is available for only

classize 
class sizes for logisitic model, which is available for only

inputs 
all inputs used in the function 
call 
the matched function call. 
Fatma Sevinc KURNAZ, Irene HOFFMANN, Peter FILZMOSER
Kurnaz, F.S., Hoffmann, I. and Filzmoser, P. (2017) Robust and sparse estimation methods for high dimensional linear and logistic regression. Chemometrics and Intelligent Laboratory Systems.
print
,
predict
,
coef
,
nonzeroCoef.enetLTS
,
plot
,
plotCoef.enetLTS
,
plotResid.enetLTS
,
plotDiagnostic.enetLTS
,
residuals
,
fitted
,
weights
## for gaussian
set.seed(86)
n < 100; p < 25 # number of observations and variables
beta < rep(0,p); beta[1:6] < 1 # 10% nonzero coefficients
sigma < 0.5 # controls signaltonoise ratio
x < matrix(rnorm(n*p, sigma),nrow=n)
e < rnorm(n,0,1) # error terms
eps < 0.1 # contamination level
m < ceiling(eps*n) # observations to be contaminated
eout < e; eout[1:m] < eout[1:m] + 10 # vertical outliers
yout < c(x %*% beta + sigma * eout) # response
xout < x; xout[1:m,] < xout[1:m,] + 10 # bad leverage points
# determine user supplied alpha and lambda sequences
# alphas=seq(0,1,length=11)
# l0 < robustHD::lambda0(xout,yout) # use lambda0 function from robustHD package
# lambdas < seq(l0,0,by=0.1*l0)
# fit < enetLTS(xout,yout,alphas=alphas,lambdas=lambdas)
## for binomial
eps <0.05 # %10 contamination to only class 0
m < ceiling(eps*n)
y < sample(0:1,n,replace=TRUE)
xout < x
xout[y==0,][1:m,] < xout[1:m,] + 10; # class 0
yout < y # wrong classification for vertical outliers
# determine user supplied alpha and lambda sequences
# alphas=seq(0,1,length=11)
# l00 < lambda00(xout,yout,normalize=TRUE,intercept=TRUE)
# lambdas < seq(l00,0,by=0.01*l00)
# fit < enetLTS(xout,yout,family="binomial",alphas=alphas,lambdas=lambdas)
## for multinomial
n < 120; p < 15
NC < 3 # number of groups
X < matrix(rnorm(n * p), n, p)
betas < matrix(1:NC, ncol=NC, nrow=p, byrow=TRUE)
betas[(p5):p,]=0; betas < rbind(rep(0,NC),betas)
lv < cbind(1,X) %*% betas
probs < exp(lv)/apply(exp(lv),1,sum)
y < apply(probs,1,function(prob){sample(1:NC, 1, TRUE, prob)})
xout < X
eps <0.05 # %10 contamination to only class 0
m < ceiling(eps*n)
xout[1:m,] < xout[1:m,] + 10 # bad leverage points
yout < y
# determine user supplied alpha and lambda sequences
alphas=seq(0,1,length=11)
lambdas < seq(from=0.95,to=0.05,by=0.05)
fit < enetLTS(xout,yout,family="multinomial",alphas=alphas,lambdas=lambdas)