Ohit {Ohit}R Documentation

Fit a high-dimensional linear regression model via OGA+HDIC+Trim

Description

The first step is to sequentially select input variables via orthogonal greedy algorithm (OGA). The second step is to determine the number of OGA iterations using high-dimensional information criterion (HDIC). The third step is to remove irrelevant variables remaining in the second step using HDIC.

Usage

Ohit(X, y, Kn = NULL, c1 = 5, HDIC_Type = "HDBIC", c2 = 2, c3 = 2.01,
  intercept = TRUE)

Arguments

X

Input matrix of n rows and p columns.

y

Response vector of length n.

Kn

The number of OGA iterations. Kn must be a positive integer between 1 and p. Default is Kn=max(1, min(floor(c1*sqrt(n/log(p))), p)), where c1 is a tuning parameter.

c1

The tuning parameter for the number of OGA iterations. Default is c1=5.

HDIC_Type

High-dimensional information criterion. The value must be "HDAIC", "HDBIC" or "HDHQ". The formula is n*log(rmse)+k_use*omega_n*log(p) where rmse is the residual mean squared error and k_use is the number of variables used to fit the model. For HDIC_Type="HDAIC", it is HDIC with omega_n=c2. For HDIC_Type="HDBIC", it is HDIC with omega_n=log(n). For HDIC_Type="HDHQ", it is HDIC with omega_n=c3*log(log(n)). Default is HDIC_Type="HDBIC".

c2

The tuning parameter for HDIC_Type="HDAIC". Default is c2=2.

c3

The tuning parameter for HDIC_Type="HDHQ". Default is c3=2.01.

intercept

Should an intercept be fitted? Default is intercept=TRUE.

Value

n

The number of observations.

p

The number of input variables.

Kn

The number of OGA iterations.

J_OGA

The index set of Kn variables sequencially selected by OGA.

HDIC

The HDIC values along the OGA path.

J_HDIC

The index set of valuables determined by OGA+HDIC.

J_Trim

The index set of valuables determined by OGA+HDIC+Trim.

betahat_HDIC

The estimated regression coefficients of the model determined by OGA+HDIC.

betahat_Trim

The estimated regression coefficients of the model determined by OGA+HDIC+Trim.

Author(s)

Hai-Tang Chiou, Ching-Kang Ing and Tze Leung Lai.

References

Ing, C.-K. and Lai, T. L. (2011). A stepwise regression method and consistent model selection for high-dimensional sparse linear models. Statistica Sinica, 21, 1473–1513.

Examples

# Example setup (Example 3 in Section 5 of Ing and Lai (2011))
n = 400
p = 4000
q = 10
beta_1q = c(3, 3.75, 4.5, 5.25, 6, 6.75, 7.5, 8.25, 9, 9.75)
b = sqrt(3/(4 * q))

x_relevant = matrix(rnorm(n * q), n, q)
d = matrix(rnorm(n * (p - q), 0, 0.5), n, p - q)
x_relevant_sum = apply(x_relevant, 1, sum)
x_irrelevant = apply(d, 2, function(a) a + b * x_relevant_sum)
X = cbind(x_relevant, x_irrelevant)
epsilon = rnorm(n)
y = as.vector((x_relevant %*% beta_1q) + epsilon)

# Fit a high-dimensional linear regression model via OGA+HDIC+Trim
Ohit(X, y, intercept = FALSE)

[Package Ohit version 1.0.0 Index]