R: Choosing the Number of Features for the Fused Lasso Latent...

FLLat.PVE {FLLat}

R Documentation

Choosing the Number of Features for the Fused Lasso Latent Feature Model

Description

Calculates the percentage of variation explained (PVE) for a range of values of J (the number of features) for the Fused Lasso Latent Feature (FLLat) model. Also plots the PVE against J, which can be used for choosing the value of J.

Usage

FLLat.PVE(Y, J.seq=seq(1,min(15,floor(ncol(Y)/2)),by=2), B=c("pc","rand"),
          lams=c("same","diff"), thresh=10^(-4), maxiter=100, maxiter.B=1,
          maxiter.T=1)

## S3 method for class 'PVE'
plot(x, xlab="Number of Features", ylab="PVE", ...)

Arguments

`Y`	A matrix of data from an aCGH experiment (usually in the form of log intensity ratios) or some other type of copy number data. Rows correspond to the probes and columns correspond to the samples.
`J.seq`	A vector of values of `J` (the number of features) for which to calculate the PVE. The default values are every second integer between `1` and smaller of either `15` or the number of samples divided by `2`.
`B`	The initial values for the features to use in the FLLat algorithm for each value of `J`. Can be one of `"pc"` (the first `J` principal components of `Y`) or `"rand"` (a random selection of `J` columns of `Y`). The default is `"pc"`.
`lams`	The choice of whether to use the same values of the tuning parameters in the FLLat algorithm for each value of `J` (`"same"`) or to calculate the optimal tuning parameters for each value of `J` (`"diff"`). When using the same values, the optimal tuning parameters are calculated once for the default value of `J` in the FLLat algorithm. The default is `"same"`.
`thresh`	The threshold for determining when the solutions have converged in the FLLat algorithm. The default is `10^{-4}`.
`maxiter`	The maximum number of iterations for the outer loop of the FLLat algorithm. The default is `100`.
`maxiter.B`	The maximum number of iterations for the inner loop of the FLLat algorithm for estimating the features `B`. The default is `1`. Increasing this may decrease the number of iterations for the outer loop but may still increase total run time.
`maxiter.T`	The maximum number of iterations for the inner loop of the FLLat algorithm for estimating the weights `\Theta`. The default is `1`. Increasing this may decrease the number of iterations for the outer loop but may still increase total run time.
`x`	An object of class `PVE`, as returned by `FLLat.PVE`.
`xlab`	The title for the `x`-axis of the PVE plot.
`ylab`	The title for the `y`-axis of the PVE plot.
`...`	Further graphical parameters.

Details

This function calculates the PVE for each value of J as specified by J.seq. The PVE is defined to be:

PVE = 1 - \frac{RSS}{TSS}

where RSS and TSS denote the residual sum of squares and the total sum of squares, respectively. For each value of J, the PVE is calculated by fitting the FLLat model with that value of J.

There are two choices for how the tuning parameters are chosen when fitting the FLLat model for each value of J. The first choice, given by lams="same", applies the FLLat.BIC function just once for the default value of J. The resulting optimal tuning parameters are then used for all values of J in J.seq. The second choice, given by lams="diff", applies the FLLat.BIC function for each value of J in J.seq. Although this second choice will give a more accurate measure of the PVE, it will take much longer to run than the first choice.

When the PVE is plotted against J, as J increases the PVE will begin to plateau after a certain point, indicating that additional features are not improving the model. Therefore, the value of J to use in the FLLat algorithm can be chosen as the point at which the PVE plot begins to plateau.

For more details, please see Nowak and others (2011) and the package vignette.

Value

An object of class PVE with components:

`PVEs`	The PVE for each value of `J` in `J.seq`.
`J.seq`	The sequence of `J` values used.

There is a plot method for PVE objects.

Author(s)

Gen Nowak gen.nowak@gmail.com, Trevor Hastie, Jonathan R. Pollack, Robert Tibshirani and Nicholas Johnson.

References

G. Nowak, T. Hastie, J. R. Pollack and R. Tibshirani. A Fused Lasso Latent Feature Model for Analyzing Multi-Sample aCGH Data. Biostatistics, 2011, doi: 10.1093/biostatistics/kxr012

Examples

## Load simulated aCGH data.
data(simaCGH)

## Generate PVEs for J ranging from 1 to the number of samples divided by 2.
result.pve <- FLLat.PVE(simaCGH,J.seq=1:(ncol(simaCGH)/2))

## Generate PVE plot.
plot(result.pve)

[Package FLLat version 1.2-1 Index]