STEP {easyCODA}R Documentation

Stepwise selection of logratios

Description

Stepwise selection of pairwise logratios that explain maximum variance in a target matrix.

Usage

STEP(data, datatarget=data, previous=NA, previous.wt=NA, weight=TRUE, 
     random=FALSE, nsteps=min(ncol(data), ncol(datatarget))-1, top=1)

Arguments

data

A data frame or matrix of compositional data on which pairwise logratios are computed

datatarget

A matrix of interval-scale data, with as many rows as data, which serves as the target matrix whose variance is to be explained (by default it is the same matrix as data, in which case total logratio variance is to be explained)

previous

A vector or matrix of variables to be forced in before logratios are sought

previous.wt

Possible weights of the variable(s) forced in before logratios are sought (if not specified, weights of 1 are assumed)

weight

TRUE (default) when weights are in data list object, FALSE for unweighted analysis, or a vector of user-defined part weights

random

TRUE if a random selection is made of tied logratios; FALSE (default) if logratio that maximizes Procrustes correlation is chosen

nsteps

Number of steps to take (by default, one less than the number of columns of data and of datatarget, whichever is smaller)

top

Number of top variance-explaining logratios returned after last step (by default, 1, i.e. the best)

Details

The function STEP sequentially computes the logratios in a data matrix (usually compositional) that best explain the variance in a second matrix, called the target matrix. By default, the target matrix is the same matrix, in which case the logratios that best explain the logratio variance in the same matrix are computed. In this case, weights for the data matrix are assumed by default, proportional to part means of the compositional data matrix. For the unweighted logratio variance, specify the option weight=FALSE. User-specified weights on the columns of the data matrix (usually compositional parts) can be provided using the same weight option.

If the target matrix is a different matrix, it is the logratio variance of that matrix that is to be explained. An option for the target matrix to be any response matrix will be in the next release.

If nsteps > 1 and top=1 the results are in the form of an optimal set of logratios that sequentially add maximum explained variance at each step. If top>1 then at the last step the ordered list of top variance-explaining logratios is returned, which allows users to make an alternative choice of the logratio based on substantive knowledge. Hence, if nsteps=1 and top=10, for example, the procedure will move only one step, but list the top 10 logratios for that step. If top=1 then all results with extension .top related to the top ratios are omitted because they are already given.

Value

names

Names of maximizing ratios in stepwise process

ratios

Indices of ratios

logratios

Matrix of logratios

R2max

Sequence of maximum cumulative explained variances

pro.cor

Corresponding sequence of Procrustes correlations

names.top

Names of "top" ratios at last step

ratios.top

Indices of "top" ratios

logratios.top

Matrix of "top" logratios

R2.top

Sequence of "top" cumulative explained variances (in descending order)

pro.cor.top

Corresponding sequence of "top" Procrustes correlations

totvar

Total logratio variance of target matrix

Author(s)

Michael Greenacre

References

Van den Wollenbergh, A. (1977), Redundancy analysis. An alternative to canonical correlation analysis, Psychometrika 42, 207-219.
Greenacre, M. (2018), Variable selection in compositional data analysis using pairwise logratios, Mathematical Geosciences, DOI: 10.1007/s11004-018-9754-x.
Greenacre, M. (2018), Compositional Data Analysis in Practice, Chapman & Hall / CRC

See Also

PLOT.RDA, CLR, LR, ALR

Examples

# Stepwise selection of ratios for RomanCups data set
data(cups)
# Set seed to obtain same results as in Appendix C of Greenacre (2018)
set.seed(2872)
STEP(cups, random=TRUE)
# Select best ratio, but output "top 5"
STEP(cups, nsteps=1, top=5)

[Package easyCODA version 0.34.3 Index]