pcovr {PCovR} | R Documentation |
Full Principal covariates regression analysis of a specific data set
Description
Application of a PCovR analysis consists of the following steps: preprocessing the data, running PCovR analyses with different numbers of components and/or weighting parameter values, performing model selection, and rotating the retained solution for easier interpretation.
Usage
pcovr(X,Y,modsel="seq",Rmin=1,Rmax=ncol(X)/3,R=NULL,weight=NULL,rot="varimax",
target=NULL, prepX="stand",prepY="stand", ratio="estimation", fold="LeaveOneOut",
zeroloads=ncol(X))
## S3 method for class 'pcovr'
plot(x,cpal=NULL,lpal=NULL,...)
Arguments
X |
Dataframe containing predictor scores |
Y |
Dataframe containing criterion scores |
modsel |
Model selection procedure (seq, seqRcv, seqAcv or sim) |
Rmin |
Lowest number of components considered |
Rmax |
Highest number of components considered |
R |
Number of components (overrules Rmin and Rmax) |
weight |
Weighting values considered |
rot |
Rotation criterion (varimax, quartimin, targetT, targetQ, |
target |
Target matrix for target rotation (components x predictor variables) |
prepX |
Preprocessing of predictor scores: standardizing (stand) or centering data (cent) |
prepY |
Preprocessing of criterion scores: standardizing (stand) or centering data (cent) |
ratio |
Ratio of the estimated error variances of the predictor block and the criterion block |
fold |
Value of k when performing k-fold cross-validation. By default, leave-one-out cross-validation is performed. |
zeroloads |
Number of near-zero loadings of the target for simplimax rotation |
x |
An object of the type produced by pcovr |
cpal |
Vector of |
lpal |
Vector of line types used for model selection plots |
... |
Further graphical arguments |
Details
Preprocessing
The PCovR package includes two preprocessing options, which can be applied to X and/or Y. Specifically, it is possible to only center the data (prepX="cent", prepY="cent"). However, the default option is to standardize the data (prepX="stand", prepY="stand"), which implies that X and/or Y are centered and normalized (i.e., each variable has a mean of zero and a standard deviation of one).
Model selection
Sequential procedure
The fastest and therefore default model selection setting (modsel="seq") implies a sequential procedure in which the weighting value is determined on the basis of maximum likelihood principles (Vervloet, Van den Noortgate, Van Deun, & Ceulemans, 2013), but taking the weighting values entered by the user (i.e., specified with the parameter weight) into account. Specifically, if the weighting value does not equal one of those values, the entered weighting value that is closest to the maximum likelihood weighting value (in absolute sense) is used. Note that the default error variance ratio is estimated with the function ErrorRatio
, but can be specified otherwise with the parameter ratio. However, this is only possible for datasets with more observations than predictor variables. Among all models with the selected weighting value and a number of components between Rmin and Rmax, the solution is picked that has the highest st value (Cattell, 1966; Wilderjans, Ceulemans, & Meers, 2012). However, models for which the fit is less than 1% better than the fit of a less complex model are excluded. Note that the assessment of the optimal number of components can be overruled, in case one is only interested in the solutions with a particular number of components. In particular, when specifying the input parameter R, Rmin and Rmax will be ignored, and the specified number of components will be used when running the analysis and determining the weighting value.
The package also provides two sequential procedures that incorporate a cross-validation step (modsel="seqRcv" and modsel="seqAcv"). seqRcv also starts with the selection of the weighting value based on maximum likelihood principles, but in the next step, the number of components is determined using leave-one-out cross-validation. seqAcv is identical to the default procedure, but has an extra step: after the selection of the number of components, leave-one-out cross-validation is applied to choose the weighting value.
Simultaneous procedure
The simultaneous procedure (modsel="sim") performs leave-one-out cross-validation for all considered weighting values (weight; by default, 100 values between .01 and 1) and all numbers of components between Rmin (default: 1) and Rmax (default: number of predictors divided by 3). The weighting parameter value and number of components that maximize the cross-validation fit are retained. Note that the parameter fold can be used to alter the number of roughly equal-sized parts in which the data are split for cross-validation (Hastie, Tibshirani, & Friedman, 2001). The default value of fold is "LeaveOneOut", implying that the data is split in N (number of observations) parts.
Interpreting the component matrices
The rotation criteria that are implemented in the PCovR package are varimax, quartimin, targetT, targetQ, wvarim
and promin
. One can also request the original solution by typing rot="none". Target rotation (Browne, 1972) orthogonally rotates the loading matrix towards a target matrix (target) that is specified by the user.
Note that Simplimax requires the specification of a number of zero elements. By default, zeroloads equals the number of predictors.
The interpretation of the obtained solution usually starts with the interpretation of the loading matrix. Specifically, the components are labeled by considering what the predictors that have the highest loadings (in absolute sense), have in common. Given these labels, the regression weights can be interpreted.
Value
pcovr
returns a list that contains the following objects (note that some objects can be empty, depending on the model selection settings used) :
Px |
Loading matrix (components x predictor variables) |
Py |
Regression weights matrix (components x criterion variables) |
Te |
Component score matrix (observations x components) |
W |
Component weights matrix (predictor variables x components) |
Rx2 |
Proportion of explained variance in X |
Ry2 |
Proportion of explained variance in Y |
Qy2 |
Cross-validation fit as a function of weighting parameter and number of components (weighting parameter x number of components) |
VAFsum |
Weighted sum of the variance accounted for in X and in Y as a function of number of components (1 x number of components) |
alpha |
Selected value of the weighting parameter |
R |
Selected number of components |
modsel |
Model selection procedure that was used |
rot |
Rotation criterion that was used |
prepX |
Method of preprocessing that was used for the predictor scores |
prepY |
Method of preprocessing that was used for the criterion scores |
Rvalues |
Numbers of components that were considered |
Alphavalues |
Weighting parameter values that were considered |
Author(s)
Marlies Vervloet (marlies.vervloet@ppw.kuleuven.be)
References
Browne, M. W. (1972). Oblique rotation to a partially specified target. British Journal of Mathematical and Statistical Psychology , 25 (2), 207-212.
Cattell, R. B. (1966). The scree test for the number of factors. Multivariate behavioral research , 1 (2), 245-276.
De Jong, S., & Kiers, H. A. (1992). Principal covariates regression: Part I. Theory. Chemometrics and Intelligent Laboratory Systems , 155-164.
Hastie, T., Tibshirani, R., & Friedman, J. (2001). The elements of statistical learning: Data mining, inference and prediction. New York: Springer.
Vervloet, M., Van Deun, K., Van den Noortgate, W., & Ceulemans, E. (2013). On the selection of the weighting parameter value in Principal Covariates Regression. Chemometrics and Intelligent Laboratory Systems.
Marlies Vervloet, Henk A. Kiers, Wim Van den Noortgate, Eva Ceulemans (2015). PCovR: An R Package for Principal Covariates Regression. Journal of Statistical Software, 65(8), 1-14. URL http://www.jstatsoft.org/v65/i08/.
Wilderjans, T. F., Ceulemans, E., & Meers, K. (2012). CHull: A generic convex-hull-based model selection method. Behavior research methods .
Examples
data(alexithymia)
results <- pcovr(alexithymia$X, alexithymia$Y)
summary(results)
plot(results)