R: Adjusting shrinkage in PC scores

pc_adjust {hdpca}

R Documentation

Adjusting shrinkage in PC scores

Description

Adjusts the shrinkage bias in the predicted PC scores based on the estimated shrinkage factors.

Usage

pc_adjust(train.eval, p, n, test.scores, method = c("d.gsp", "l.gsp", "osp"),
n.spikes, n.spikes.max, smooth = TRUE)

Arguments

`train.eval`	Numeric vector containing the sample eigenvalues. The vector must have dimension `n` or `n-1`, it may be unordered.
`p`	The number of features.
`n`	The number of training samples.
`test.scores`	An `m\times k` matrix or data frame containing the first `k` predicted PC scores of `m` many test samples.
`method`	String specifying the estimation method. Possible values are "`d.gsp`" (default),"`l.gsp`" and "`osp`".
`n.spikes`	Number of distant spikes in the population (Optional).
`n.spikes.max`	Upper bound of the number of distant spikes in the population. Optional, but needed if `n.spikes` is not specified. Ignored if `n.spikes` is specified.
`smooth`	Logical. If `TRUE` and `method="l.gsp"`, kernel smoothing will be performed on the estimated population eigenvalue spectrum. Default is `TRUE`.

Details

The different choices for method are:

"d.gsp": d-estimation method based on the Generalized Spiked Population (GSP) model.
"l.gsp": \lambda-estimation method based on the GSP model.
"osp": Estimation method based on the Ordinary Spiked Population (OSP) model.

The (i,j)^{th} element of test.scores should denote the j^{th} predicted PC score for the i^{th} subject in the test sample.

At least one of n.spikes and n.spikes.max must be provided. If n.spikes is provided then n.spikes.max is ignored, else n.spikes.max is used to find out the number of distant spikes using select.nspike.

The argument nonspikes.out is ignored if method="d.gsp" or "osp".

The argument smooth is useful when the user assumes the population spectral distribution to be continuous.

Value

A matrix containing the bias-adjusted PC scores. The dimension of the matrix is the same as the dimension of test.scores.

A printed message shows the number of top PCs that were adjusted for shrinkage bias.

Author(s)

Rounak Dey, deyrnk@umich.edu

References

Dey, R. and Lee, S. (2019). Asymptotic properties of principal component analysis and shrinkage-bias adjustment under the generalized spiked population model. Journal of Multivariate Analysis, Vol 173, 145-164.

Examples

data(hapmap)
#n = 198, p = 75435 for this data

####################################################
## Not run: 
#First estimate the number of spikes and then adjust test scores based on that
train.eval<-hapmap$train.eval
n<-hapmap$nSamp
p<-hapmap$nSNP
trainscore<-hapmap$trainscore
testscore<-hapmap$testscore

m<-select.nspike(train.eval,p,n,n.spikes.max=10,evals.out=FALSE)$n.spikes
score.adj.o1<-pc_adjust(train.eval,p,n,testscore,method="osp",n.spikes=m)
score.adj.d1<-pc_adjust(train.eval,p,n,testscore,method="d.gsp",n.spikes=m)
score.adj.l1<-pc_adjust(train.eval,p,n,testscore,method="l.gsp",n.spikes=m)

#Or you can provide an upper bound n.spikes.max
score.adj.o2<-pc_adjust(train.eval,p,n,testscore,method="osp",n.spikes.max=10)
score.adj.d2<-pc_adjust(train.eval,p,n,testscore,method="d.gsp",n.spikes.max=10)
score.adj.l2<-pc_adjust(train.eval,p,n,testscore,method="l.gsp",n.spikes.max=10)

#Plot the training score, test score, and adjusted scores
plot(trainscore,pch=19)
points(testscore,col='blue',pch=19)
points(score.adj.o1,col='red',pch=19)
points(score.adj.d2,col='green',pch=19)

## End(Not run)

[Package hdpca version 1.1.5 Index]