rca {dml} | R Documentation |
Relevant Component Analysis
Description
Performs relevant component analysis on the given data.
Usage
rca(x, chunks)
Arguments
x |
matrix or data frame of original data. Each row is a feature vector of a data instance. |
chunks |
list of |
Details
The RCA function takes a data set and a set of positive constraints as arguments and returns a linear transformation of the data space into better representation, alternatively, a Mahalanobis metric over the data space.
Relevant component analysis consists of three steps:
locate the test point
compute the distances between the test points
find
k
shortest distances and the bla
The new representation is known to be optimal in an information theoretic sense under a constraint of keeping equivalent data points close to each other.
Value
list of the RCA results:
B |
The RCA suggested Mahalanobis matrix. Distances between data points x1, x2 should be computed by (x2 - x1)' * B * (x2 - x1) |
A |
The RCA suggested transformation of the data. The data should be transformed by A * data |
newX |
The data after the RCA transformation (A). newData = A * data |
The three returned argument are just different forms of the same output. If one is interested in a Mahalanobis metric over the original data space, the first argument is all she/he needs. If a transformation into another space (where one can use the Euclidean metric) is preferred, the second returned argument is sufficient. Using A and B is equivalent in the following sense:
if y1 = A * x1, y2 = A * y2 then (x2 - x1)' * B * (x2 - x1) = (y2 - y1)' * (y2 - y1)
Note
Note that any different sets of instances (chunklets), e.g. 1, 3, 7 and 4, 6, might belong to the same class and might belong to different classes.
Author(s)
Xiao Nan <http://www.road2stat.com>
References
Aharon Bar-Hillel, Tomer Hertz, Noam Shental, and Daphna Weinshall (2003). Learning Distance Functions using Equivalence Relations. Proceedings of 20th International Conference on Machine Learning (ICML2003).
See Also
See dca
for exploiting negative constrains.
Examples
## Not run:
set.seed(1234)
require(MASS) # generate synthetic Gaussian data
k = 100 # sample size of each class
n = 3 # specify how many class
N = k * n # total sample number
x1 = mvrnorm(k, mu = c(-10, 6), matrix(c(10, 4, 4, 10), ncol = 2))
x2 = mvrnorm(k, mu = c(0, 0), matrix(c(10, 4, 4, 10), ncol = 2))
x3 = mvrnorm(k, mu = c(10, -6), matrix(c(10, 4, 4, 10), ncol = 2))
x = as.data.frame(rbind(x1, x2, x3))
x$V3 = gl(n, k)
# The fully labeled data set with 3 classes
plot(x$V1, x$V2, bg = c("#E41A1C", "#377EB8", "#4DAF4A")[x$V3],
pch = c(rep(22, k), rep(21, k), rep(25, k)))
Sys.sleep(3)
# Same data unlabeled; clearly the classes' structure is less evident
plot(x$V1, x$V2)
Sys.sleep(3)
chunk1 = sample(1:100, 5)
chunk2 = sample(setdiff(1:100, chunk1), 5)
chunk3 = sample(101:200, 5)
chunk4 = sample(setdiff(101:200, chunk3), 5)
chunk5 = sample(201:300, 5)
chks = x[c(chunk1, chunk2, chunk3, chunk4, chunk5), ]
chunks = list(chunk1, chunk2, chunk3, chunk4, chunk5)
# The chunklets provided to the RCA algorithm
plot(chks$V1, chks$V2, col = rep(c("#E41A1C", "#377EB8",
"#4DAF4A", "#984EA3", "#FF7F00"), each = 5),
pch = rep(0:4, each = 5), ylim = c(-15, 15))
Sys.sleep(3)
# Whitening transformation applied to the chunklets
chkTransformed = as.matrix(chks[ , 1:2]) %*% rca(x[ , 1:2], chunks)$A
plot(chkTransformed[ , 1], chkTransformed[ , 2], col = rep(c(
"#E41A1C", "#377EB8", "#4DAF4A", "#984EA3", "#FF7F00"), each = 5),
pch = rep(0:4, each = 5), ylim = c(-15, 15))
Sys.sleep(3)
# The origin data after applying the RCA transformation
plot(rca(x[ , 1:2], chunks)$newX[, 1], rca(x[ , 1:2], chunks)$newX[, 2],
bg = c("#E41A1C", "#377EB8", "#4DAF4A")[gl(n, k)],
pch = c(rep(22, k), rep(21, k), rep(25, k)))
# The RCA suggested transformation of the data, dimensionality reduced
rca(x[ , 1:2], chunks)$A
# The RCA suggested Mahalanobis matrix
rca(x[ , 1:2], chunks)$B
## End(Not run)