create_formula {sgboost}R Documentation

Create a sparse-group boosting formula

Description

Creates a mboost formula that allows to fit a sparse-group boosting model based on boosted Ridge Regression with mixing parameter alpha. The formula consists of a group baselearner part with degrees of freedom 1-alpha and individual baselearners with degrees of freedom alpha. Groups should be defined through group_df. The corresponding modeling data should not contain categorical variables with more than two categories, as they are then treated as a group only.

Usage

create_formula(
  alpha = 0.3,
  group_df = NULL,
  blearner = "bols",
  outcome_name = "y",
  group_name = "group_name",
  var_name = "var_name",
  intercept = FALSE
)

Arguments

alpha

Numeric mixing parameter. For alpha = 0 only group baselearners and for alpha = 1 only individual baselearners are defined.

group_df

input data.frame containing variable names with group structure.

blearner

Type of baselearner. Default is 'bols'.

outcome_name

String indicating the name of dependent variable. Default is "y"

group_name

Name of column in group_df indicating the group structure of the variables. Default is ⁠"group_name⁠.

var_name

Name of column in group_df containing the variable names to be used as predictors. Default is "var_name". should not contain categorical variables with more than two categories, as they are then treated as a group only.

intercept

Logical, should intercept be used?

Value

Character containing the formula to be passed to mboost::mboost() yielding the sparse-group boosting for a given value mixing parameter alpha.

Examples

library(mboost)
library(dplyr)
set.seed(1)
df <- data.frame(
  x1 = rnorm(100), x2 = rnorm(100), x3 = rnorm(100),
  x4 = rnorm(100), x5 = runif(100)
)
df <- df %>%
  mutate_all(function(x) {
    as.numeric(scale(x))
  })
df$y <- df$x1 + df$x4 + df$x5
group_df <- data.frame(
  group_name = c(1, 1, 1, 2, 2),
  var_name = c("x1", "x2", "x3", "x4", "x5")
)

sgb_formula <- create_formula(alpha = 0.3, group_df = group_df)
sgb_model <- mboost(formula = sgb_formula, data = df)
summary(sgb_model)

[Package sgboost version 0.1.3 Index]