sharp-package {sharp}R Documentation

sharp: Stability-enHanced Approaches using Resampling Procedures

Description

In stability selection and consensus clustering, resampling techniques are used to enhance the reliability of the results. In this package, hyper-parameters are calibrated by maximising model stability, which is measured under the null hypothesis that all selection (or co-membership) probabilities are identical. Functions are readily implemented for the use of LASSO regression, sparse PCA, sparse (group) PLS or graphical LASSO in stability selection, and hierarchical clustering, partitioning around medoids, K means or Gaussian mixture models in consensus clustering.

Details

Package: sharp
Type: Package
Version: 1.4.6
Date: 2024-02-03
License: GPL (>= 3)
Maintainer: Barbara Bodinier barbara.bodinier@gmail.com

References

Bodinier B, Vuckovic D, Rodrigues S, Filippi S, Chiquet J, Chadeau-Hyam M (2023). “Automated calibration of consensus weighted distance-based clustering approaches using sharp.” Bioinformatics, btad635. ISSN 1367-4811, doi:10.1093/bioinformatics/btad635, https://academic.oup.com/bioinformatics/advance-article-pdf/doi/10.1093/bioinformatics/btad635/52191190/btad635.pdf.

Bodinier B, Filippi S, Nøst TH, Chiquet J, Chadeau-Hyam M (2023). “Automated calibration for stability selection in penalised regression and graphical models.” Journal of the Royal Statistical Society Series C: Applied Statistics, qlad058. ISSN 0035-9254, doi:10.1093/jrsssc/qlad058, https://academic.oup.com/jrsssc/advance-article-pdf/doi/10.1093/jrsssc/qlad058/50878777/qlad058.pdf.

Meinshausen N, Bühlmann P (2010). “Stability selection.” Journal of the Royal Statistical Society: Series B (Statistical Methodology), 72(4), 417-473. doi:10.1111/j.1467-9868.2010.00740.x.

Monti S, Tamayo P, Mesirov J, Golub T (2003). “Consensus Clustering: A Resampling-Based Method for Class Discovery and Visualization of Gene Expression Microarray Data.” Machine Learning, 52(1), 91–118. doi:10.1023/A:1023949509487.

Examples


oldpar <- par(no.readonly = TRUE)
par(mar = c(5, 5, 5, 5))

## Regression models
# Data simulation
set.seed(1)
simul <- SimulateRegression(n = 100, pk = 50)

# Stability selection
stab <- VariableSelection(xdata = simul$xdata, ydata = simul$ydata)
CalibrationPlot(stab)
summary(stab)
SelectedVariables(stab)


## Graphical models
# Data simulation
set.seed(1)
simul <- SimulateGraphical(n = 100, pk = 20, topology = "scale-free")

# Stability selection
stab <- GraphicalModel(xdata = simul$data)
CalibrationPlot(stab)
summary(stab)
plot(stab)


## PCA models
if (requireNamespace("elasticnet", quietly = TRUE)) {
  # Data simulation
  set.seed(1)
  simul <- SimulateComponents(pk = c(5, 3, 4))
  plot(simul)

  # Stability selection
  stab <- BiSelection(
    xdata = simul$data,
    ncomp = 3,
    implementation = SparsePCA
  )
  CalibrationPlot(stab)
  summary(stab)
  SelectedVariables(stab)
}


## PLS models
if (requireNamespace("sgPLS", quietly = TRUE)) {
  # Data simulation
  set.seed(1)
  simul <- SimulateRegression(n = 50, pk = c(10, 20, 30), family = "gaussian")

  # Stability selection
  stab <- BiSelection(
    xdata = simul$xdata, ydata = simul$ydata,
    family = "gaussian", ncomp = 3,
    implementation = SparsePLS
  )
  CalibrationPlot(stab)
  summary(stab)
  plot(stab)
}

par(oldpar)


[Package sharp version 1.4.6 Index]