Fit a high-dimensional principal fitted components model
using the method of Cook, Forzani, and Rothman (2012).
Description
Let (x1,y1),…,(xn,yn) denote the n measurements of
the predictor and response, where xi∈Rp and yi∈R.
The model assumes that these measurements
are a realization
of n independent copies of
the random vector (X,Y)′, where
X=μX+Γβ{f(Y)−μf}+ϵ,
μX∈Rp; Γ∈Rp×d with rank d;
β∈Rd×r with rank d; f:R→Rr is a known
vector valued function; μf=E{f(Y)};
ϵ∼Np(0,Δ); and Y is independent of ϵ.
The central subspace is Δ−1span(Γ).
This function computes estimates of these model parameters
by imposing constraints for identifiability.
The mean parameters μX and μf
are estimated with xˉ=n−1∑i=1nxi and
fˉ=n−1∑i=1nf(yi).
Let Φ=n−1∑i=1n{f(yi)−fˉ}{f(yi)−fˉ}′,
which we require to be positive definite.
Given a user-specified weight matrix W,
let
The predictor matrix with n rows and p columns. The ith row is xi defined above.
y
The vector of measured responses with n entries. The ith entry is yi defined above.
r
When polynomial basis functions are used (which is the case when F.user=NULL), r is the polynomial
order, i.e,
f(y)=(y,y2,…,yr)′. The default is r=4. This argument is not used
when F.user is specified.
d
The dimension of the central subspace defined above. This must be specified by the user
when weight.type="L1".
If unspecified by the user this function will use the sequential permutation testing procedure, described in Section 8.2
of Cook, Forzani, and Rothman (2012), to select d.
F.user
A matrix with n rows and r columns, where the ith row is f(yi) defined above.
This argument is optional, and will typically be used when polynomial basis functions are not desired.
weight.type
The type of weight matrix estimate W to use.
Let Δ be the observed residual sample covariance matrix for the multivariate
regression of X on f(Y) with n−r−1 scaling.
There are three options for W:
weight.type="sample" uses a Moore-Penrose generalized inverse of Δ for W,
when p≤n−r−1 this becomes the inverse of Δ;
weight.type="diag" uses the inverse of the diagonal matrix with the same diagonal as Δ
for W;
weight.type="L1" uses the L1-penalized inverse of Δ described in equation (5.4) of Cook, Forzani,
and Rothman (2012). In this case, lam.vec and d must be specified by the user.
The glasso algorithm of Friedman et al. (2008) is used through the R package glasso.
lam.vec
A vector of candidate tuning parameter values to use when weight.type="L1". If this vector has more than one entry,
then kfold cross validation will be performed to select the optimal tuning parameter value.
kfold
The number of folds to use in cross-validation to select the optimal tuning parameter when weight.type="L1".
Only used if lam.vec has more than one entry.
silent
Logical. When silent=FALSE, progress updates are printed.
qrtol
The tolerance for calls to qr.solve().
cov.tol
The convergence tolerance for the QUIC algorithm used when weight.type="L1".
cov.maxit
The maximum number of iterations allowed for the QUIC algorithm used when weight.type="L1".
NPERM
The number of permutations to used in the sequential permutation testing procedure to select d.
Only used when d is unspecified.
level
The significance level to use to terminate the sequential permutation testing procedure to select d.
Details
See Cook, Forzani, and Rothman (2012) more information.
Value
A list with
Gamhat
this is Γ described above.
bhat
this is β described above.
Rmat
this is WΓ(Γ′WΓ)−1.
What
this is W described above.
d
this is d described above.
r
this is r described above.
GWG
this is Γ′WΓ
fc
a matrix with n rows and r columns where the ith row is f(yi)−fˉ.
Xc
a matrix with n rows and p columns where the ith row is xi−xˉ.
y
the vector of n response measurements.
mx
this is xˉ described above.
mf
this is fˉ described above.
best.lam
this is selected tuning parameter value used when weight.type="L1", will be NULL otherwise.
lam.vec
this is the vector of candidate tuning parameter values used when
weight.type="L1", will be NULL otherwise.
err.vec
this is the vector of validation errors from cross validation, one error for each entry in lam.vec.
Will be NULL unless weight.type="L1" and lam.vec has more than one entry.
test.info
a dataframe that summarizes the results from the sequential testing procedure. Will be NULL
unless d is unspecified.
Author(s)
Adam J. Rothman
References
Cook, R. D., Forzani, L., and Rothman, A. J. (2012).
Estimating sufficient reductions of the predictors in abundant high-dimensional regressions.
Annals of Statistics 40(1), 353-384.
Friedman, J., Hastie, T., and Tibshirani R. (2008).
Sparse inverse covariance estimation with the lasso.
Biostatistics 9(3), 432-441.