jMatrix {qlcMatrix} | R Documentation |
Harmonize (‘join’) sparse matrices
Description
A utility function to make sparse matrices conformable semantically. Not only are the dimensions made conformable in size, but also the content of the dimensions. Formulated differently, this function harmonizes two matrices on a dimensions that have the same entities, but in a different order (and possibly with different subsets). Given two matrices with such (partly overlapping) dimensions, two new matrices are generated to reorder the original matrices via a matrix product to make them conformable. In an abstract sense, this is similar to an SQL ‘inner join’ operation.
Usage
jMatrix(rownamesX, rownamesY, collation.locale = "C")
jcrossprod(X, Y, rownamesX = rownames(X), rownamesY = rownames(Y))
tjcrossprod(X, Y, colnamesX = colnames(X), colnamesY = colnames(Y))
Arguments
rownamesX , rownamesY |
rownames to be joined from two matrices. |
X , Y |
sparse matrices to be made (semantically) conformable. |
colnamesX , colnamesY |
colnames to be joined from two matrices. |
collation.locale |
locale to be used for ordering of the joined dimension. Defaults to pure numerical unicode ordering "C". See |
Details
Given a sparse matrix X with rownames rX and a sparse matrix Y with rownames rY, the function jMatrix
produces joined rownames rXY with all unique entries in c(rX, rY)
, reordered according to the specified locale, if necessary.
Further, two sparse matrices M1 and M2 are returned to link X and Y to the new joined dimension rXY. Specifically, X2 = M1 %*% X and Y2 = M2 %*% Y will have conformable rXY rows, so crossprod(X2, Y2) can be computed. Note that the result will be empty when there is no overlap between the rownames of X and Y.
The function jcrossprod
is a shortcut to compute the above crossproduct immediately, using jMatrix
internally to harmonize the rows. Similarly, tjcrossprod
computes the tcrossprod, harmonizing the columns of two matrices using jMatrix
.
Value
jMatrix
returns a list of three elements (for naming, see Details above):
M1 |
sparse pattern matrix of type |
M2 |
sparse pattern matrix of type |
rownames |
unique joined row names rXY |
jcrossprod
and tjcrossprod
return a sparse Matrix of type ngCMatrix
when both X and Y are pattern matrices. Otherwise they return a sparse Matrix of type dgCMatrix
.
Note
Actually, it is unimportant whether the inputs to jMatrix
are row or column names. However, care has to be taken to use the resulting matrices in the right transposition. To make this function easier to explain, I consistently talk only about row names above.
Author(s)
Michael Cysouw
Examples
# example about INNER JOIN from wikipedia
# http://en.wikipedia.org/wiki/Sql_join#Inner_join
# this might look complex, but it is maximally efficient on large sparse matrices
# Employee table as sparse Matrix
Employee.LastName <- c("Rafferty","Jones","Heisenberg","Robinson","Smith","John")
Employee.DepartmentID <- c(31,33,33,34,34,NA)
E.LN <- ttMatrix(Employee.LastName, simplify = TRUE)
E.DID <- ttMatrix(Employee.DepartmentID, simplify = TRUE)
( Employees <- tcrossprod(E.LN, E.DID) )
# Department table as sparse Matrix
Department.DepartmentID <- c(31,33,34,35)
Department.DepartmentName <- c("Sales","Engineering","Clerical","Marketing")
D.DID <- ttMatrix(Department.DepartmentID, simplify = TRUE)
D.DN <- ttMatrix(Department.DepartmentName, simplify = TRUE)
( Departments <- tcrossprod(D.DN, D.DID) )
# INNER JOIN on DepartmentID (i.e. on the columns of these two matrices)
# result is a sparse matrix linking Employee.LastName to Department.DepartmentName,
# internally having used the DepartmentID for the linking
( JOIN <- tjcrossprod(Employees, Departments) )
# Note that in this example it is much easier to directly use jMatrix on the DepartmentIDs
# instead of first making sparse matrices from the data
# and then using tjcrossprod on the matrices to get the INNER JOIN
# (only the ordering is different in this direct approach)
J <- jMatrix(Employee.DepartmentID, Department.DepartmentID)
JOIN <- crossprod(J$M1, J$M2)
rownames(JOIN) <- Employee.LastName
colnames(JOIN) <- Department.DepartmentName
JOIN