R: Fit a high-dimensional linear regression model via...

Ohit {Ohit}

R Documentation

Fit a high-dimensional linear regression model via OGA+HDIC+Trim

Description

The first step is to sequentially select input variables via orthogonal greedy algorithm (OGA). The second step is to determine the number of OGA iterations using high-dimensional information criterion (HDIC). The third step is to remove irrelevant variables remaining in the second step using HDIC.

Usage

Ohit(X, y, Kn = NULL, c1 = 5, HDIC_Type = "HDBIC", c2 = 2, c3 = 2.01,
  intercept = TRUE)

Arguments

`X`	Input matrix of `n` rows and `p` columns.
`y`	Response vector of length `n`.
`Kn`	The number of OGA iterations. `Kn` must be a positive integer between `1` and `p`. Default is `Kn=max(1, min(floor(c1`*`sqrt(n/log(p))), p))`, where `c1` is a tuning parameter.
`c1`	The tuning parameter for the number of OGA iterations. Default is `c1=5`.
`HDIC_Type`	High-dimensional information criterion. The value must be `"HDAIC"`, `"HDBIC"` or `"HDHQ"`. The formula is `n``log(rmse)+k_use``omega_n``log(p)` where `rmse` is the residual mean squared error and `k_use` is the number of variables used to fit the model. For `HDIC_Type="HDAIC"`, it is HDIC with `omega_n=c2`. For `HDIC_Type="HDBIC"`, it is HDIC with `omega_n=log(n)`. For `HDIC_Type="HDHQ"`, it is HDIC with `omega_n=c3``log(log(n))`. Default is `HDIC_Type="HDBIC"`.
`c2`	The tuning parameter for `HDIC_Type="HDAIC"`. Default is `c2=2`.
`c3`	The tuning parameter for `HDIC_Type="HDHQ"`. Default is `c3=2.01`.
`intercept`	Should an intercept be fitted? Default is `intercept=TRUE`.

Value

`n`	The number of observations.
`p`	The number of input variables.
`Kn`	The number of OGA iterations.
`J_OGA`	The index set of Kn variables sequencially selected by OGA.
`HDIC`	The HDIC values along the OGA path.
`J_HDIC`	The index set of valuables determined by OGA+HDIC.
`J_Trim`	The index set of valuables determined by OGA+HDIC+Trim.
`betahat_HDIC`	The estimated regression coefficients of the model determined by OGA+HDIC.
`betahat_Trim`	The estimated regression coefficients of the model determined by OGA+HDIC+Trim.

Author(s)

Hai-Tang Chiou, Ching-Kang Ing and Tze Leung Lai.

References

Ing, C.-K. and Lai, T. L. (2011). A stepwise regression method and consistent model selection for high-dimensional sparse linear models. Statistica Sinica, 21, 1473–1513.

Examples

# Example setup (Example 3 in Section 5 of Ing and Lai (2011))
n = 400
p = 4000
q = 10
beta_1q = c(3, 3.75, 4.5, 5.25, 6, 6.75, 7.5, 8.25, 9, 9.75)
b = sqrt(3/(4 * q))

x_relevant = matrix(rnorm(n * q), n, q)
d = matrix(rnorm(n * (p - q), 0, 0.5), n, p - q)
x_relevant_sum = apply(x_relevant, 1, sum)
x_irrelevant = apply(d, 2, function(a) a + b * x_relevant_sum)
X = cbind(x_relevant, x_irrelevant)
epsilon = rnorm(n)
y = as.vector((x_relevant %*% beta_1q) + epsilon)

# Fit a high-dimensional linear regression model via OGA+HDIC+Trim
Ohit(X, y, intercept = FALSE)

[Package Ohit version 1.0.0 Index]