STEP {easyCODA} | R Documentation |
Stepwise selection of logratios
Description
Stepwise selection of pairwise logratios that explain maximum variance in a target matrix.
Usage
STEP(data, datatarget=data, previous=NA, previous.wt=NA, weight=TRUE,
random=FALSE, nsteps=min(ncol(data), ncol(datatarget))-1, top=1)
Arguments
data |
A data frame or matrix of compositional data on which pairwise logratios are computed |
datatarget |
A matrix of interval-scale data, with as many rows as |
previous |
A vector or matrix of variables to be forced in before logratios are sought |
previous.wt |
Possible weights of the variable(s) forced in before logratios are sought (if not specified, weights of 1 are assumed) |
weight |
|
random |
|
nsteps |
Number of steps to take (by default, one less than the number of columns of data and of datatarget, whichever is smaller) |
top |
Number of top variance-explaining logratios returned after last step (by default, 1, i.e. the best) |
Details
The function STEP
sequentially computes the logratios in a data matrix (usually compositional) that best explain the variance in a second matrix, called the target matrix. By default, the target matrix is the same matrix, in which case the logratios that best explain the logratio variance in the same matrix are computed.
In this case, weights for the data matrix are assumed by default, proportional to part means of the compositional data matrix.
For the unweighted logratio variance, specify the option weight=FALSE
.
User-specified weights on the columns of the data matrix (usually compositional parts) can be provided using the same weight
option.
If the target matrix is a different matrix, it is the logratio variance of that matrix that is to be explained. An option for the target matrix to be any response matrix will be in the next release.
If nsteps > 1
and top=1
the results are in the form of an optimal set of logratios that sequentially add maximum explained variance at each step.
If top>1
then at the last step the ordered list of top variance-explaining logratios is returned, which allows users to make an alternative choice of the logratio based on substantive knowledge. Hence, if nsteps=1
and top=10
, for example, the procedure will move only one step, but list the top 10 logratios for that step. If top=1
then all results with extension .top
related to the top ratios are omitted because they are already given.
Value
names |
Names of maximizing ratios in stepwise process |
ratios |
Indices of ratios |
logratios |
Matrix of logratios |
R2max |
Sequence of maximum cumulative explained variances |
pro.cor |
Corresponding sequence of Procrustes correlations |
names.top |
Names of "top" ratios at last step |
ratios.top |
Indices of "top" ratios |
logratios.top |
Matrix of "top" logratios |
R2.top |
Sequence of "top" cumulative explained variances (in descending order) |
pro.cor.top |
Corresponding sequence of "top" Procrustes correlations |
totvar |
Total logratio variance of target matrix |
Author(s)
Michael Greenacre
References
Van den Wollenbergh, A. (1977), Redundancy analysis. An alternative to canonical correlation analysis, Psychometrika 42, 207-219.
Greenacre, M. (2018), Variable selection in compositional data analysis using pairwise logratios, Mathematical Geosciences, DOI: 10.1007/s11004-018-9754-x.
Greenacre, M. (2018), Compositional Data Analysis in Practice, Chapman & Hall / CRC
See Also
Examples
# Stepwise selection of ratios for RomanCups data set
data(cups)
# Set seed to obtain same results as in Appendix C of Greenacre (2018)
set.seed(2872)
STEP(cups, random=TRUE)
# Select best ratio, but output "top 5"
STEP(cups, nsteps=1, top=5)