R: Improved Function for Obtaining Principal Components

iprcomp {statVisual}

R Documentation

Improved Function for Obtaining Principal Components

Description

Calculate principal components when data contains missing values.

Usage

iprcomp(dat, center = TRUE, scale. = FALSE)

Arguments

`dat`	n by p matrix. rows are subjects and columns are variables
`center`	logical. Indicates if each row of `dat` needs to be mean-centered
`scale.`	logical. Indicates if each row of `dat` needs to be scaled to have variance one

Details

We first set missing values as median of the corresponding variable, then call the function prcomp. This is a very simple solution. The user can use their own imputation methods before calling prcomp.

Value

A list of 3 elements

`sdev`	square root of the eigen values
`rotation`	a matrix with columns are eigen vectors, i.e., projection direction
`x`	a matrix with columns are principal components

Author(s)

Wenfei Zhang <Wenfei.Zhang@sanofi.com>, Weiliang Qiu <Weiliang.Qiu@sanofi.com>, Xuan Lin <Xuan.Lin@sanofi.com>, Donghui Zhang <Donghui.Zhang@sanofi.com>

Examples

# generate simulated data
set.seed(1234567)
dat.x = matrix(rnorm(500), nrow = 100, ncol = 5)
dat.y = matrix(rnorm(500, mean = 2), nrow = 100, ncol = 5)
dat = rbind(dat.x, dat.y)
grp = c(rep(0, 100), rep(1, 100))
print(dim(dat))

res = iprcomp(dat, center = TRUE, scale.  =  FALSE)

# for each row, set one artificial missing value
dat.na=dat
nr=nrow(dat.na)
nc=ncol(dat.na)
for(i in 1:nr)
{
  posi=sample(x=1:nc, size=1)
  dat.na[i,posi]=NA
}

res.na = iprcomp(dat.na, center = TRUE, scale.  =  FALSE)

##
# pca plot
##
par(mfrow = c(3,1))
# original data without missing values
plot(x = res$x[,1], y = res$x[,2], xlab = "PC1", ylab  =  "PC2")
# perturbed data with one NA per probe 
# the pattern of original data is captured
plot(x = res.na$x[,1], y = res.na$x[,2], xlab = "PC1", ylab  =  "PC2", main = "with missing values")
par(mfrow = c(1,1))

[Package statVisual version 1.2.1 Index]