slm {care} | R Documentation |

`slm`

fits a linear model and computes
(standardized) regression coefficients by plugin of shrinkage estimates of correlations and variances.
Using the argument `predlist`

several models can be fitted on the same data set.

`make.predlist`

constructs a `predlist`

argument for use with `slm`

.

```
slm(Xtrain, Ytrain, predlist, lambda, lambda.var, diagonal=FALSE, verbose=TRUE)
## S3 method for class 'slm'
predict(object, Xtest, verbose=TRUE, ...)
make.predlist(ordering, numpred, name="SIZE")
```

`Xtrain` |
Matrix of predictors (columns correspond to variables). |

`Ytrain` |
Univariate continous response variable. |

`predlist` |
A list specifying the predictors to be included when fitting the linear regression. Each entry in the list is a vector containing the indices of variables used per model. If left unspecified single full-sized model using all variables in Xtrain is assumed. For a given ordering of covariables a suitable |

`lambda` |
The correlation shrinkage intensity (range 0-1).
If not specified (the default) it is estimated using an
analytic formula from Sch\"afer and Strimmer (2005). For |

`lambda.var` |
The variance shrinkage intensity (range 0-1). If
not specified (the default) it is estimated
using an analytic formula from Opgen-Rhein and Strimmer
(2007). For |

`diagonal` |
If |

`verbose` |
If |

`object` |
An |

`Xtest` |
A matrix containing the test data set. Note that the rows correspond to observations and the columns to variables. |

`...` |
Additional arguments for generic predict. |

`ordering` |
The ordering of the predictors (most important predictors are first). |

`numpred` |
The number of included predictors (may be a scalar or a vector). The predictors
are included in the order specified by |

`name` |
The name assigned to each model is |

The regression coefficients are obtained by estimating the joint joint covariance matrix of the response and the predictors, and subsequently computing the the regression coefficients by inversion of this matrix - see Opgen-Rhein and Strimmer (2007). As estimators for the covariance matrix either the standard empirical estimator or a Stein-type shrinkage estimator is employed. The use of the empirical covariance leads to the OLS estimates of the regression coefficients, whereas otherwise shrinkage estimates are obtained.

`slm`

returns a list with the following components:

`regularization`

: The shrinkage intensities used for estimating correlations and variances.

`std.coefficients`

: The standardized regression coefficients, i.e. the regression coefficients
computed from centered and standardized input data. Thus, by construction the intercept is zero.
Furthermore, for `diagonal=TRUE`

the standardized regression coefficient for each predictor is
identical to the respective marginal correlation.

`coefficients`

: Regression coefficients.

`numpred`

: The number of predictors used in each investigated model.

`R2`

: For `diagonal=TRUE`

this is the multiple correlation coefficient
between the response and the predictor, or the proportion of explained variance, with range
from 0 to 1.
For `diagonal=TRUE`

this equals the sum of squared marginal
correlations. Note that this sum may be larger than 1!

`sd.resid`

: The residual unexplained error.

`predict.slm`

returns the means predicted for each sample and model as well as the corresponding
predictive standard deviations (attached as attribute "sd").

Korbinian Strimmer (https://strimmerlab.github.io).

Opgen-Rhein, R., and K. Strimmer. 2007. From correlation to causation networks: a simple approximate learning algorithm and its application to high-dimensional plant gene expression data. BMC Syst. Biol. 1: 37. <DOI:10.1186/1752-0509-1-37>

Sch\"afer, J., and K. Strimmer. 2005. A shrinkage approach to large-scale covariance estimation and implications for functional genomics. Statist. Appl. Genet. Mol. Biol. 4: 32. <DOI:10.2202/1544-6115.1175>

```
# load care library
library("care")
## example with large number of samples and small dimension
## (using empirical estimates of regression coefficients)
# diabetes data
data(efron2004)
x = efron2004$x
y = efron2004$y
n = dim(x)[1]
d = dim(x)[2]
xnames = colnames(x)
# empirical regression coefficients
fit = slm(x, y, lambda=0, lambda.var=0)
fit
# note that in this example the regression coefficients
# and the standardized regression coefficients are identical
# as the input data have been standardized to mean zero and variance one
# compute corresponding t scores / partial correlations
df = n-d-1
pcor = pcor.shrink(cbind(y,x), lambda=0)[-1,1]
t = pcor * sqrt(df/(1-pcor^2))
t.pval = 2 - 2 * pt(abs(t), df)
b = fit$coefficients[1,-1]
cbind(b, pcor, t, t.pval)
# compare results with those from lm function
lm.out = lm(y ~ x)
summary(lm.out)
# prediction of fitted values at the position of the training data
lm.out$fitted.values
mu.hat = predict(fit, x) # precticted means
mu.hat
attr(mu.hat, "sd") # predictive error
sd(y-mu.hat)
# ordering of the variables using squared empirical CAR score
car = carscore(x, y, lambda=0)
ocar = order(car^2, decreasing=TRUE)
xnames[ocar]
# CAR regression models with 5, 7, 9 included predictors
car.predlist = make.predlist(ocar, numpred = c(5,7,9), name="CAR")
car.predlist
slm(x, y, car.predlist, lambda=0, lambda.var=0)
# plot regression coefficients for all possible CAR models
p=ncol(x)
car.predlist = make.predlist(ocar, numpred = 1:p, name="CAR")
cm = slm(x, y, car.predlist, lambda=0, lambda.var=0)
bmat = cm$coefficients[,-1]
bmat
par(mfrow=c(2,1))
plot(1:p, bmat[,1], type="l",
ylab="estimated regression coefficients",
xlab="number of included predictors",
main="CAR Regression Models for Diabetes Data",
xlim=c(1,p+1), ylim=c(min(bmat), max(bmat)))
for (i in 2:p) lines(1:p, bmat[,i], col=i, lty=i)
for (i in 1:p) points(1:p, bmat[,i], col=i)
for (i in 1:p) text(p+0.5, bmat[p,i], xnames[i])
plot(1:p, cm$R2, type="l",
ylab="estimated R2",
xlab="number of included predictors",
main="Proportion of Explained Variance",
ylim=c(0,0.6))
R2max = max(cm$R2)
lines(c(1,p), c(R2max, R2max), col=2)
par(mfrow=c(1,1))
## example with small number of samples and large dimension
## (using shrinkage estimates of regression coefficients)
data(lu2004)
dim(lu2004$x) # 30 403
fit = slm(lu2004$x, lu2004$y)
fit
```

[Package *care* version 1.1.11 Index]