SMLE-package {SMLE}R Documentation

Joint SMLE-screening for generalized linear models

Description

Feature screening is a powerful tool in processing ultrahigh dimensional data. It attempts to screen out most irrelevant features in preparation for a more elaborate analysis. This package provides an efficient implementation of SMLE-screening for linear, logistic, and Poisson models, where the joint effects among features are naturally incorporated in the screening process. The package also provides a function for conducting accurate post-screening feature selection based on an iterative hard-thresholding procedure and a user-specified selection criterion.

Details

Package: smle
Type: Package
Version: 2.1-1
Date: 2024-02-12
License: GPL-3

Input a n \times 1 response vector Y and a n \times p predictor (feature) matrix X. The package outputs a set of k < n features that seem to be most relevant for joint regression. Moreover, the package provides a data simulator that generates synthetic datasets from high-dimensional GLMs, which accommodate both numerical and categorical features with commonly used correlation structures.

Key functions:
Gen_Data
SMLE
smle_select

Author(s)

Qianxiang Zang, Chen Xu, Kelly Burkett
Maintainer: Qianxiang Zang <qzang023@uottawa.ca>

References

Xu, C. and Chen, J. (2014) The Sparse MLE for Ultrahigh-Dimensional Feature Screening Journal of the American Statistical Association, 109(507), 1257–1269.

Friedman, J., Hastie, T. and Tibshirani, R. (2010) Regularization Paths for Generalized Linear Models via Coordinate Descent Journal of Statistical Software, 33(1), 1-22.

Examples

set.seed(1)
#Generate correlated data
Data <- Gen_Data(n = 200, p = 5000, correlation = "MA",family = "gaussian")
print(Data)

# joint feature screening via SMLE
fit <- SMLE(Y = Data$Y, X = Data$X, k = 10, family = "gaussian")
print(fit)
summary(fit)
plot(fit)

#Are there any features missed after screening?
setdiff(Data$subset_true, fit$ID_retained)

# Elaborative selection after screening
fit_s <- smle_select(fit, gamma_ebic = 0.5, vote = FALSE)

#Are there any features missed after selection? 
setdiff(Data$subset_true, fit_s$ID_selected)
print(fit_s)
summary(fit_s)
plot(fit_s)

[Package SMLE version 2.1-1 Index]