fit.pfc {abundant} | R Documentation |
Fit a high-dimensional principal fitted components model using the method of Cook, Forzani, and Rothman (2012).
Description
Let (x_1, y_1), \ldots, (x_n, y_n)
denote the n
measurements of
the predictor and response, where x_i\in R^p
and y_i \in R
.
The model assumes that these measurements
are a realization
of n
independent copies of
the random vector (X,Y)'
, where
X = \mu_X + \Gamma \beta\{f(Y) - \mu_f\}+ \epsilon,
\mu_X\in R^p
; \Gamma\in R^{p\times d}
with rank d
;
\beta \in R^{d\times r}
with rank d
; f: R \rightarrow R^r
is a known
vector valued function; \mu_f = E\{f(Y)\}
;
\epsilon \sim N_p(0, \Delta)
; and Y
is independent of \epsilon
.
The central subspace is \Delta^{-1} {\rm span}(\Gamma)
.
This function computes estimates of these model parameters
by imposing constraints for identifiability.
The mean parameters \mu_X
and \mu_f
are estimated with \bar x = n^{-1}\sum_{i=1}^n x_i
and
\bar f = n^{-1} \sum_{i=1}^n f(y_i)
.
Let \widehat\Phi = n^{-1}\sum_{i=1}^{n} \{f(y_i) - \bar f\}\{f(y_i) - \bar f\}'
,
which we require to be positive definite.
Given a user-specified weight matrix \widehat W
,
let
(\widehat\Gamma, \widehat\beta) =
\arg\min_{G\in R^{p\times d}, B \in R^{d\times r}} \sum_{i=1}^n [x_i - \bar x - GB\{f(y_i) - \bar f\}]'\widehat W [x_i - \bar x - GB\{f(y_i) - \bar f\}],
subject to the constraints that G'\widehat W G
is diagonal and
B \widehat\Phi B' = I
. The sufficient reduction estimate \widehat R: R^p \rightarrow R^d
is defined by
\widehat R(x) = (\widehat\Gamma'\widehat W \widehat\Gamma)^{-1} \widehat\Gamma' \widehat W(x - \bar x).
Usage
fit.pfc(X, y, r=4, d=NULL, F.user=NULL, weight.type=c("sample", "diag", "L1"),
lam.vec=NULL, kfold=5, silent=TRUE, qrtol=1e-10, cov.tol=1e-4,
cov.maxit=1e3, NPERM=1e3, level=0.01)
Arguments
X |
The predictor matrix with |
y |
The vector of measured responses with |
r |
When polynomial basis functions are used (which is the case when |
d |
The dimension of the central subspace defined above. This must be specified by the user
when |
F.user |
A matrix with |
weight.type |
The type of weight matrix estimate
|
lam.vec |
A vector of candidate tuning parameter values to use when |
kfold |
The number of folds to use in cross-validation to select the optimal tuning parameter when |
silent |
Logical. When |
qrtol |
The tolerance for calls to |
cov.tol |
The convergence tolerance for the QUIC algorithm used when |
cov.maxit |
The maximum number of iterations allowed for the QUIC algorithm used when |
NPERM |
The number of permutations to used in the sequential permutation testing procedure to select |
level |
The significance level to use to terminate the sequential permutation testing procedure to select |
Details
See Cook, Forzani, and Rothman (2012) more information.
Value
A list with
Gamhat |
this is |
bhat |
this is |
Rmat |
this is |
What |
this is |
d |
this is |
r |
this is |
GWG |
this is |
fc |
a matrix with |
Xc |
a matrix with |
y |
the vector of |
mx |
this is |
mf |
this is |
best.lam |
this is selected tuning parameter value used when |
lam.vec |
this is the vector of candidate tuning parameter values used when
|
err.vec |
this is the vector of validation errors from cross validation, one error for each entry in |
test.info |
a dataframe that summarizes the results from the sequential testing procedure. Will be |
Author(s)
Adam J. Rothman
References
Cook, R. D., Forzani, L., and Rothman, A. J. (2012). Estimating sufficient reductions of the predictors in abundant high-dimensional regressions. Annals of Statistics 40(1), 353-384.
Friedman, J., Hastie, T., and Tibshirani R. (2008). Sparse inverse covariance estimation with the lasso. Biostatistics 9(3), 432-441.
See Also
Examples
set.seed(1)
n=20
p=30
d=2
y=sqrt(12)*runif(n)
Gam=matrix(rnorm(p*d), nrow=p, ncol=d)
beta=diag(2)
E=matrix(0.5*rnorm(n*p), nrow=n, ncol=p)
V=matrix(c(1, sqrt(12), sqrt(12), 12.8), nrow=2, ncol=2)
tmp=eigen(V, symmetric=TRUE)
V.msqrt=tcrossprod(tmp$vec*rep(tmp$val^(-0.5), each=2), tmp$vec)
Fyc=cbind(y-sqrt(3),y^2-4)%*%V.msqrt
X=0+Fyc%*%t(beta)%*%t(Gam) + E
fit=fit.pfc(X=X, y=y, r=3, weight.type="sample")
## display hypothesis testing information for selecting d
fit$test.info
## make a response versus fitted values plot
plot(pred.response(fit), y)