`cv.cpg`

computes and cross-validates the coefficients for ensembles of generalized linear models via competing proximal gradients.

cv.cpg( x, y, glm_type = c("Linear", "Logistic", "Gamma", "Poisson")[1], G = 5, full_diversity = FALSE, include_intercept = TRUE, alpha_s = 3/4, alpha_d = 1, n_lambda_sparsity = 100, n_lambda_diversity = 100, balanced_cycling = TRUE, permutate_search = FALSE, acceleration = FALSE, tolerance = 1e-05, max_iter = 1e+05, n_folds = 10, n_threads = 1 )

`x` |
Design matrix. |

`y` |
Response vector. |

`glm_type` |
Description of the error distribution and link function to be used for the model. Must be one of "Linear", "Logistic", "Gamma" or "Poisson". Default is "Linear". |

`G` |
Number of groups in the ensemble. |

`full_diversity` |
Argument to determine if the overlap between the models should be zero. Default is FALSE. |

`include_intercept` |
Argument to determine whether there is an intercept. Default is TRUE. |

`alpha_s` |
Sparsity mixing parmeter. Default is 3/4. |

`alpha_d` |
Diversity mixing parameter. Default is 1. |

`n_lambda_sparsity` |
Number of candidates for sparsity tuning parameter. Default is 100. |

`n_lambda_diversity` |
Number of candidates for diveristy tuning parameter. Default is 100. |

`balanced_cycling` |
Argument to determine the cycling strategy for the optimal solution search. Default is TRUE. |

`permutate_search` |
Argument to determine whether permutations are used to search for the optimal solution. Default is FALSE. |

`acceleration` |
Argument to determine whether a gradient acceleration method is used. Default is FALSE. |

`tolerance` |
Convergence criteria for the coefficients. Default is 1e-3. |

`max_iter` |
Maximum number of iterations in the algorithm. Default is 1e5. |

`n_folds` |
Number of cross-validation folds. Default is 10. |

`n_threads` |
Number of threads. Default is a single thread. |

An object of class `cv.cpg`

Anthony-Alexander Christidis, anthony.christidis@stat.ubc.ca

`coef.cv.CPGLIB`

, `predict.cv.CPGLIB`

# Data simulation set.seed(1) n <- 50 N <- 2000 p <- 300 beta.active <- c(abs(runif(p, 0, 1/2))*(-1)^rbinom(p, 1, 0.3)) # Parameters p.active <- 150 beta <- c(beta.active[1:p.active], rep(0, p-p.active)) Sigma <- matrix(0, p, p) Sigma[1:p.active, 1:p.active] <- 0.5 diag(Sigma) <- 1 # Train data x.train <- mvnfast::rmvn(n, mu = rep(0, p), sigma = Sigma) prob.train <- exp(x.train %*% beta)/ (1+exp(x.train %*% beta)) y.train <- rbinom(n, 1, prob.train) # Test data x.test <- mvnfast::rmvn(N, mu = rep(0, p), sigma = Sigma) prob.test <- exp(x.test %*% beta)/ (1+exp(x.test %*% beta)) y.test <- rbinom(N, 1, prob.test) # CV CPGLIB - Multiple Groups cpg.out <- cv.cpg(x.train, y.train, glm_type = "Logistic", G = 5, include_intercept = TRUE, alpha_s = 3/4, alpha_d = 1, n_lambda_sparsity = 100, n_lambda_diversity = 100, balanced_cycling = TRUE, tolerance = 1e-5, max_iter = 1e5) # Predictions cpg.prob <- predict(cpg.out, newx = x.test, type = "prob", groups = 1:cpg.out$G, ensemble_type = "Model-Avg") cpg.class <- predict(cpg.out, newx = x.test, type = "class", groups = 1:cpg.out$G, ensemble_type = "Model-Avg") plot(prob.test, cpg.prob, pch = 20) abline(h = 0.5,v = 0.5) mean((prob.test-cpg.prob)^2) mean(abs(y.test-cpg.class))

