SamplePCA {ClassDiscovery} | R Documentation |
Class "SamplePCA"
Description
Perform principal components analysis on the samples (columns) from a microarray or proteomics experiment.
Usage
SamplePCA(data, splitter=0, usecor=FALSE, center=TRUE)
## S4 method for signature 'SamplePCA,missing'
plot(x, splitter=x@splitter, col, main='', which=1:2, ...)
Arguments
data |
Either a data frame or matrix with numeric values or an
|
splitter |
If |
center |
A logical value; should the rows of the data matrix be centered first? |
usecor |
A logical value; should the rows of the data matrix be scaled to have standard deviation 1? |
x |
A |
col |
A list of colors to represent each level of the
|
main |
A character string; the plot title |
which |
A numeric vector of length two specifying which two principal components should be included in the plot. |
... |
Additional graphical parameters for |
.
Details
The main reason for developing the SamplePCA
class is that the
princomp
function is very inefficient when the number of
variables (in the microarray setting, genes) far exceeds the number of
observations (in the microarray setting, biological samples). The
princomp
function begins by computing the full covariance
matrix, which gets rather large in a study involving tens of thousands
of genes. The SamplePCA
class, by contrast, uses singular
value decomposition (svd
) on the original data matrix to
compute the principal components.
The base functions screeplot
, which produces a barplot of the
percentage of variance explained by each component, and plot
,
which produces a scatter plot comparing two selected components
(defaulting to the first two), have been generalized as methods for
the SamplePCA
class. You can add sample labels to the scatter
plot using either the text
or identify
methods. One
should, however, note that the current implementaiton of these methods
only works when plotting the first two components.
Value
The SamplePCA
function constructs and returns an object of the
SamplePCA
class. We assume that the input data matrix has N
columns (of biological samples) and P rows (of genes).
The predict
method returns a matrix whose size is the number of
columns in the input by the number of principal components.
Objects from the Class
Objects should be created using the SamplePCA
function. In the
simplest case, you simply pass in a data matrix and a logical vector,
splitter
, assigning classes to the columns, and the constructor
performs principal components analysis on the column. The
splitter
is ignored by the constructor and is simply saved to
be used by the plotting routines. If you omit the splitter
,
then no grouping structure is used in the plots.
If you pass splitter
as a factor instead of a logical vector,
then the plotting routine will distinguish all levels of the factor.
The code is likely to fail, however, if one of the levels of the
factor has zero representatives among the data columns.
We can also perform PCA on
ExpressionSet
objects
from the BioConductor libraries. In this case, we pass in an
ExpressionSet
object along with a character string containing the
name of a factor to use for splitting the data.
Slots
scores
:A
matrix
of size NxN, where N is the number of columns in the input, representing the projections of the input columns onto the first N principal components.variances
:A
numeric
vector of length N; the amount of the total variance explained by each principal component.components
:A
matrix
of size PxN (the same size as the input matrix) containing each of the first P principal components as columns.splitter
:A logical vector or factor of length N classifying the columns into known groups.
usecor
:A
logical
value; was the data standardized?shift
:A
numeric
vector of length P; the mean vector of the input data, which is used for centering by thepredict
method.scale
:A
numeric
vector of length P; the standard deviation of the input data, which is used for scaling by thepredict
method.call
:An object of class
call
that records how the object was created.
Methods
- plot
signature(x = SamplePCA, y = missing)
: Plot the samples in a two-dimensional principal component space.- predict
signature(object = SamplePCA)
: Project new data into the principal component space.- screeplot
signature(x = SamplePCA)
: Produce a bar chart of the variances explained by each principal component.- summary
signature(object = SamplePCA)
: Write out a summary of the object.- identify
signature(object = SamplePCA)
: interactively identify points in the plot of aSamplePCA
object.- text
signature(object = SamplePCA)
: Add sample identifiers to the scatter plot of aSamplePCA
object, using the basetext
function.
Author(s)
Kevin R. Coombes krc@silicovore.com
See Also
Examples
showClass("SamplePCA")
## simulate data from three different groups
d1 <- matrix(rnorm(100*10, rnorm(100, 0.5)), nrow=100, ncol=10, byrow=FALSE)
d2 <- matrix(rnorm(100*10, rnorm(100, 0.5)), nrow=100, ncol=10, byrow=FALSE)
d3 <- matrix(rnorm(100*10, rnorm(100, 0.5)), nrow=100, ncol=10, byrow=FALSE)
dd <- cbind(d1, d2, d3)
kind <- factor(rep(c('red', 'green', 'blue'), each=10))
colnames(dd) <- paste(kind, rep(1:10, 3), sep='')
## perform PCA
spc <- SamplePCA(dd, splitter=kind)
## plot the results
plot(spc, col=levels(kind))
## mark the group centers
x1 <- predict(spc, matrix(apply(d1, 1, mean), ncol=1))
points(x1[1], x1[2], col='red', cex=2)
x2 <- predict(spc, matrix(apply(d2, 1, mean), ncol=1))
points(x2[1], x2[2], col='green', cex=2)
x3 <- predict(spc, matrix(apply(d3, 1, mean), ncol=1))
points(x3[1], x3[2], col='blue', cex=2)
## check out the variances
screeplot(spc)
## cleanup
rm(d1, d2, d3, dd,kind, spc, x1, x2, x3)