R: Control parameters for trained NBLDA model.

nbldaControl {NBLDA}

R Documentation

Control parameters for trained NBLDA model.

Description

Define control parameters to be used within trainNBLDA function.

Usage

nbldaControl(folds = 5, repeats = 2, foldIdx = NULL, rhos = NULL,
  beta = 1, prior = NULL, transform = FALSE, alpha = NULL, truephi = NULL,
  target = 0, phi.epsilon = 0.15, normalize.target = FALSE, delta = NULL,
  multicore = FALSE, ...)

Arguments

`folds`	A positive integer. The number of folds for k-fold model validation.
`repeats`	A positive integer. This is the number of repeats for k-fold model validation. If NULL, 0 or negative, it is set to 1.
`foldIdx`	a list with indices of hold-out samples for each fold. It should be a list where folds are nested within repeats. If NULL, `folds` and `repeats` are used to define hold-out samples.
`rhos`	A vector of tuning parameters that control the amount of soft thresholding performed. If NULL, it is automatically generated within `trainNBLDA` using `tuneLength`, i.e., the length of grid search. See details.
`beta`	A smoothing term. A Gamma(beta,beta) prior is used to fit the Poisson model. Recommendation is to just leave it at 1, the default value. See Witten (2011) and Dong et al. (2016) for details.
`prior`	A vector with a length equal to the number of classes indicates the prior class probabilities. If NULL, all classes are assumed to be equally distributed.
`transform`	a logical. If TRUE, count data is transformed using power transformation. If `alpha` is not specified the power transformation parameter is automatically calculated using a goodness-of-fit test. See Witten (2011) for details.
`alpha`	a numeric value within [0, 1] to be used for power transformation.
`truephi`	a vector with a length equal to the number of variables. Its elements represent the true overdispersion parameters for each variable. If a single value is given, it is recycled for all variables. If a vector whose length is not equal to the number of variables given, the first element of this vector is used and recycled for all variables. If NULL, estimated overdispersions are used in the classifier. See details.
`target`	a value for the shrinkage target of dispersion estimates. If NULL, then then a value that is small and minimizes the average squared difference is automatically used as the target value. See `getT` for details.
`phi.epsilon`	a positive value for controlling the number of features whose dispersions are shrinked towards 0. See details.
`normalize.target`	a logical. If TRUE and `target` is NULL, the target value is estimated using the normalized dispersion estimates. See `getT` for details.
`delta`	a weight within the interval [0, 1] that is used while shrinking dispersions towards 0. When "delta = 0", initial dispersion estimates are forced to be shrunk to 1. Similarly, if "delta = 0", no shrinkage is performed on the initial estimates.
`multicore`	a logical. If a parallel backend is loaded and available, the function runs in parallel setting for speeding up the computations.
`...`	further arguments passed to `trainNBLDA`.

Details

rhos is used to control the level of sparsity, i.e., the number of variables (or features) used in the classifier. If a variable has no contribution to the discrimination function, it should be removed from the model. By setting rhos within the interval [0, Inf], it is possible to control the number of variables that are removed from the model. As the upper bound of rhos decreases towards 0, fewer variables are removed. If rhos = 0, all variables are included in the classifier.

truephi controls how the Poisson model differs from the Negative Binomial model. If overdispersion is zero, the Negative Binomial model converges to the Poisson model. Hence, the results from trainNBLDA are identical to PLDA results from Classify when truephi = 0.

phi.epsilon is a value used to shrink estimated overdispersions towards 0. The Poisson model assumes that there is no overdispersion in the observed counts. However, this is not a valid assumption in highly overdispersed count data. NBLDA performs a shrinkage on estimated overdispersions. Although the amount of shrinkage is dependent on several parameters such as delta, target, and truephi, some of the shrunken overdispersions might be very close to 0. By defining a threshold value for shrunken overdispersions, it is possible to shrink very small overdispersions towards 0. If estimated overdispersion is below phi.epsilon, it is shrunken to 0. If phi.epsilon = NULL, threshold value is set to 0. Hence, all the variables with very small overdispersion are included in the NBLDA model.

Value

a list with all the control elements.

Author(s)

Dincer Goksuluk

References

Witten, DM (2011). Classification and clustering of sequencing data using a Poisson model. Ann. Appl. Stat. 5(4), 2493–2518. doi:10.1214/11-AOAS493.

Dong, K., Zhao, H., Tong, T., & Wan, X. (2016). NBLDA: negative binomial linear discriminant analysis for RNA-Seq data. BMC Bioinformatics, 17(1), 369. http://doi.org/10.1186/s12859-016-1208-1.

Yu, D., Huber, W., & Vitek, O. (2013). Shrinkage estimation of dispersion in Negative Binomial models for RNA-seq experiments with small sample size. Bioinformatics, 29(10), 1275-1282.

Examples

nbldaControl()  # return default control parameters.

[Package NBLDA version 1.0.1 Index]