R: Stepwise selection of logratios

STEP {easyCODA}

R Documentation

Stepwise selection of logratios

Description

Stepwise selection of pairwise logratios that explain maximum variance in a target matrix.

Usage

STEP(data, datatarget=data, previous=NA, previous.wt=NA, weight=TRUE, 
     random=FALSE, nsteps=min(ncol(data), ncol(datatarget))-1, top=1)

Arguments

`data`	A data frame or matrix of compositional data on which pairwise logratios are computed
`datatarget`	A matrix of interval-scale data, with as many rows as `data`, which serves as the target matrix whose variance is to be explained (by default it is the same matrix as data, in which case total logratio variance is to be explained)
`previous`	A vector or matrix of variables to be forced in before logratios are sought
`previous.wt`	Possible weights of the variable(s) forced in before logratios are sought (if not specified, weights of 1 are assumed)
`weight`	`TRUE` (default) when weights are in data list object, `FALSE` for unweighted analysis, or a vector of user-defined part weights
`random`	`TRUE` if a random selection is made of tied logratios; `FALSE` (default) if logratio that maximizes Procrustes correlation is chosen
`nsteps`	Number of steps to take (by default, one less than the number of columns of data and of datatarget, whichever is smaller)
`top`	Number of top variance-explaining logratios returned after last step (by default, 1, i.e. the best)

Details

The function STEP sequentially computes the logratios in a data matrix (usually compositional) that best explain the variance in a second matrix, called the target matrix. By default, the target matrix is the same matrix, in which case the logratios that best explain the logratio variance in the same matrix are computed. In this case, weights for the data matrix are assumed by default, proportional to part means of the compositional data matrix. For the unweighted logratio variance, specify the option weight=FALSE. User-specified weights on the columns of the data matrix (usually compositional parts) can be provided using the same weight option.

If the target matrix is a different matrix, it is the logratio variance of that matrix that is to be explained. An option for the target matrix to be any response matrix will be in the next release.

If nsteps > 1 and top=1 the results are in the form of an optimal set of logratios that sequentially add maximum explained variance at each step. If top>1 then at the last step the ordered list of top variance-explaining logratios is returned, which allows users to make an alternative choice of the logratio based on substantive knowledge. Hence, if nsteps=1 and top=10, for example, the procedure will move only one step, but list the top 10 logratios for that step. If top=1 then all results with extension .top related to the top ratios are omitted because they are already given.

Value

`names`	Names of maximizing ratios in stepwise process
`ratios`	Indices of ratios
`logratios`	Matrix of logratios
`R2max`	Sequence of maximum cumulative explained variances
`pro.cor`	Corresponding sequence of Procrustes correlations
`names.top`	Names of "top" ratios at last step
`ratios.top`	Indices of "top" ratios
`logratios.top`	Matrix of "top" logratios
`R2.top`	Sequence of "top" cumulative explained variances (in descending order)
`pro.cor.top`	Corresponding sequence of "top" Procrustes correlations
`totvar`	Total logratio variance of target matrix

Author(s)

Michael Greenacre

References

Van den Wollenbergh, A. (1977), Redundancy analysis. An alternative to canonical correlation analysis, Psychometrika 42, 207-219.
Greenacre, M. (2018), Variable selection in compositional data analysis using pairwise logratios, Mathematical Geosciences, DOI: 10.1007/s11004-018-9754-x.
Greenacre, M. (2018), Compositional Data Analysis in Practice, Chapman & Hall / CRC

Examples

# Stepwise selection of ratios for RomanCups data set
data(cups)
# Set seed to obtain same results as in Appendix C of Greenacre (2018)
set.seed(2872)
STEP(cups, random=TRUE)
# Select best ratio, but output "top 5"
STEP(cups, nsteps=1, top=5)

[Package easyCODA version 0.34.3 Index]