crossVal {bbl}R Documentation

Cross-Validation of BB Learning

Description

Run multiple fittings of bbl model with training/validation division of data

Usage

crossVal(
  formula,
  data,
  weights,
  novarOk = FALSE,
  lambda = 1e-05,
  lambdah = 0,
  eps = 0.9,
  nfold = 5,
  method = "pseudo",
  use.auc = TRUE,
  verbose = 1,
  progress.bar = FALSE,
  storeOpt = TRUE,
  ...
)

Arguments

formula

Formula for model. Note that intercept has no effect.

data

Data frame of data. Column names must match formula.

weights

Frequency vector of how many times each row of data must be repeated. If NULL, defaults to vector of 1s. Fractional values are not supported.

novarOk

Proceed even when there are predictors with only one factor level.

lambda

Vector of L2 penalizer values for method = 'pseudo'. Inferences will be repeated for each value. Restricited to non-negative values.

lambdah

L2 penalizer in method = 'pseudo' applied to parameter h. In contrast to lambda, only a single value is allowed.

eps

Vector of regularization parameters, \epsilon\in[0,1], for method = 'mf'. Inference will be repeated for each value.

nfold

Number of folds for training/validation split.

method

c('pseudo','mf') for pseudo-likelihood maximization or mean field.

use.auc

Use AUC as the measure of prediction accuracy. Only works if response groups are binary. If FALSE, mean prediction group accuracy will be used as score.

verbose

Verbosity level. Downgraded when relayed into bbl.

progress.bar

Display progress bar in predict.

storeOpt

Store the optimal fitted object of class bbl.

...

Other parameters to mlestimate.

Details

The data slot of object is split into training and validation subsets of (nfold-1):1 ratio. The model is trained with the former and validated on the latter. Individual division/fold results are combined into validation result for all instances in the data set and prediction score is evaluated using the known response group identity.

Value

Object of class cv.bbl extending bbl, a list with extra components: regstar, Value of regularization parameter, lambda and eps for method='pseudo' and method='mf',respectively, at which the accuracy score is maximized; maxscore, Value of maximum accuracy; cvframe, Data frame of regularization parameters and scores scanned. If use.auc=TRUE, also contains 95

Examples

set.seed(513)
m <- 5
n <- 100
predictors <- list()
for(i in 1:m) predictors[[i]] <- c('a','c','g','t')
names(predictors) <- paste0('v',1:m)
par <- list(randompar(predictors), randompar(predictors, h0=0.1, J0=0.1))
dat <- randomsamp(predictors, response=c('ctrl','case'), par=par, nsample=n)
cv <- crossVal(y ~ .^2, data=dat, method='mf', eps=seq(0.1,0.9,0.1))
cv

[Package bbl version 1.0.0 Index]