R: Finding Distant Spikes

select.nspike {hdpca}

R Documentation

Finding Distant Spikes

Description

Estimates the number of distant spikes in the population based on the Generalized Spiked Population model. A finite upper bound (n.spikes.max) of the number of distant spikes must be provided.

Usage

select.nspike(samp.eval, p, n, n.spikes.max, evals.out = FALSE, smooth = TRUE)

Arguments

`samp.eval`	Numeric vector containing the sample eigenvalues. The vector must have dimension `n` or `n-1`, it may be unordered.
`p`	The number of features.
`n`	The number of samples.
`n.spikes.max`	Upper bound of the number of distant spikes in the population.
`evals.out`	Logical. If `TRUE`, the estimated spikes and non-spikes are returned.
`smooth`	Logical. If `TRUE`, kernel smoothing will be performed on the estimated population eigenvalue spectrum. Default is `TRUE`.

Details

The function searches between 0 and n.spikes.max to find out the number of distant spikes in the population. It also estimates both non-spiked and spiked eigenvalues based on the \lambda-estimation method.

The argument smooth is useful when the user assumes the population spectral distribution to be continuous.

Value

`n.spikes`	Estimated number of distant spikes.
`spikes`	If `evals.out=TRUE`, estimated distant spikes are returned.
`nonspikes`	If `evals.out=TRUE`, estimated non-spikes are returned.
`loss`	If `evals.out=TRUE`, L-infinity loss function for the spectrum estimation is returned.

Author(s)

Rounak Dey, deyrnk@umich.edu

References

Dey, R. and Lee, S. (2019). Asymptotic properties of principal component analysis and shrinkage-bias adjustment under the generalized spiked population model. Journal of Multivariate Analysis, Vol 173, 145-164.

Examples

data(hapmap)
#n = 198, p = 75435 for this data

####################################################
## Not run: 
#If you just want the estimated number of spikes
train.eval<-hapmap$train.eval
n<-hapmap$nSamp
p<-hapmap$nSNP

select.nspike(train.eval,p,n,n.spikes.max=10,evals.out=FALSE)

#If you want the estimated spikes and non-spikes
out<-select.nspike(train.eval,p,n,n.spikes.max=10,evals.out=TRUE)

## End(Not run)

[Package hdpca version 1.1.5 Index]