cv.cpg {CPGLIB}R Documentation

Competing Proximal Gradients Library for Ensembles of Generalized Linear Models - Cross-Validation

Description

cv.cpg computes and cross-validates the coefficients for ensembles of generalized linear models via competing proximal gradients.

Usage

cv.cpg(
  x,
  y,
  glm_type = c("Linear", "Logistic", "Gamma", "Poisson")[1],
  G = 5,
  full_diversity = FALSE,
  include_intercept = TRUE,
  alpha_s = 3/4,
  alpha_d = 1,
  n_lambda_sparsity = 100,
  n_lambda_diversity = 100,
  balanced_cycling = TRUE,
  permutate_search = FALSE,
  acceleration = FALSE,
  tolerance = 1e-05,
  max_iter = 1e+05,
  n_folds = 10,
  n_threads = 1
)

Arguments

x

Design matrix.

y

Response vector.

glm_type

Description of the error distribution and link function to be used for the model. Must be one of "Linear", "Logistic", "Gamma" or "Poisson". Default is "Linear".

G

Number of groups in the ensemble.

full_diversity

Argument to determine if the overlap between the models should be zero. Default is FALSE.

include_intercept

Argument to determine whether there is an intercept. Default is TRUE.

alpha_s

Sparsity mixing parmeter. Default is 3/4.

alpha_d

Diversity mixing parameter. Default is 1.

n_lambda_sparsity

Number of candidates for sparsity tuning parameter. Default is 100.

n_lambda_diversity

Number of candidates for diveristy tuning parameter. Default is 100.

balanced_cycling

Argument to determine the cycling strategy for the optimal solution search. Default is TRUE.

permutate_search

Argument to determine whether permutations are used to search for the optimal solution. Default is FALSE.

acceleration

Argument to determine whether a gradient acceleration method is used. Default is FALSE.

tolerance

Convergence criteria for the coefficients. Default is 1e-3.

max_iter

Maximum number of iterations in the algorithm. Default is 1e5.

n_folds

Number of cross-validation folds. Default is 10.

n_threads

Number of threads. Default is a single thread.

Value

An object of class cv.cpg

Author(s)

Anthony-Alexander Christidis, anthony.christidis@stat.ubc.ca

See Also

coef.cv.CPGLIB, predict.cv.CPGLIB

Examples


# Data simulation
set.seed(1)
n <- 50
N <- 2000
p <- 300
beta.active <- c(abs(runif(p, 0, 1/2))*(-1)^rbinom(p, 1, 0.3))
# Parameters
p.active <- 150
beta <- c(beta.active[1:p.active], rep(0, p-p.active))
Sigma <- matrix(0, p, p)
Sigma[1:p.active, 1:p.active] <- 0.5
diag(Sigma) <- 1

# Train data
x.train <- mvnfast::rmvn(n, mu = rep(0, p), sigma = Sigma) 
prob.train <- exp(x.train %*% beta)/
              (1+exp(x.train %*% beta))
y.train <- rbinom(n, 1, prob.train)
# Test data
x.test <- mvnfast::rmvn(N, mu = rep(0, p), sigma = Sigma)
prob.test <- exp(x.test %*% beta)/
             (1+exp(x.test %*% beta))
y.test <- rbinom(N, 1, prob.test)

# CV CPGLIB - Multiple Groups
cpg.out <- cv.cpg(x.train, y.train,
                  glm_type = "Logistic",
                  G = 5, include_intercept = TRUE,
                  alpha_s = 3/4, alpha_d = 1,
                  n_lambda_sparsity = 100, n_lambda_diversity = 100,
                  balanced_cycling = TRUE,
                  tolerance = 1e-5, max_iter = 1e5)

# Predictions
cpg.prob <- predict(cpg.out, newx = x.test, type = "prob", 
                    groups = 1:cpg.out$G, ensemble_type = "Model-Avg")
cpg.class <- predict(cpg.out, newx = x.test, type = "class", 
                     groups = 1:cpg.out$G, ensemble_type = "Model-Avg")
plot(prob.test, cpg.prob, pch = 20)
abline(h = 0.5,v = 0.5)
mean((prob.test-cpg.prob)^2)
mean(abs(y.test-cpg.class))




[Package CPGLIB version 1.0.1 Index]