CalcAndWriteDissimilarityMatrix {parallelpam} | R Documentation |
CalcAndWriteDissimilarityMatrix
Description
Writes a binary symmetric matrix with the dissimilarities between ROWS of the data stored in a binary matrix in the jmatrix/parallelpam package format.
The input matrix of vectors can be a full or a sparse matrix and the algorithm has been modified to calculate faster for sparse matrices.
Output matrix type can be float or double type (but look at the comments in 'Details').
Usage
CalcAndWriteDissimilarityMatrix(
ifname,
ofname,
distype = "L2",
restype = "float",
comment = "",
nthreads = 0L
)
Arguments
ifname |
A string with the name of the file containing the counts as a binary matrix. |
ofname |
A string with the name of the binary output file to contain the symmetric dissimilarity matrix. |
distype |
The dissimilarity to be calculated. It must be one of these strings: 'L1', 'L2', 'Pearson', 'Cos' or 'WEuc'. |
restype |
The data type of the result. It can be one of the strings 'float' or 'double'. Default: float (and don't change it unless you REALLY need to...). |
comment |
Comment to be added to the dissimilary matrix. Default: "" (no comment) |
nthreads |
Number of threads to be used for the parallel calculations with this meaning: |
Details
The parameter restype forces the output to be a matrix of either floats or doubles. Precision of float is normally good enough; but if you need
double precision (may be because you expect your results to be in a large range, two to three orders of magnitude), change it.
Nevertheless, notice that this at the expense of double memory usage, which is QUADRATIC with the number of individuals (rows) in your input matrix.
Value
No return value, called for side effects (creates a file)
Examples
Rf <- matrix(runif(50000),nrow=100)
tmpfile1=paste0(tempdir(),"/Rfullfloat.bin")
JWriteBin(Rf,tmpfile1,dtype="float",dmtype="full",
comment="Full matrix of floats, 100 rows, 500 columns")
JMatInfo(tmpfile1)
tmpdisfile1=paste0(tempdir(),"/RfullfloatDis.bin")
# Distance file calculated from the matrix stored as full
CalcAndWriteDissimilarityMatrix(tmpfile1,tmpdisfile1,distype="L2",
restype="float",comment="L2 distance matrix from full",nthreads=0)
JMatInfo(tmpdisfile1)
tmpfile2=paste0(tempdir(),"/Rsparsefloat.bin")
JWriteBin(Rf,tmpfile2,dtype="float",dmtype="sparse",
comment="Sparse matrix of floats, 100 rows, 500 columns")
JMatInfo(tmpfile2)
# Distance file calculated from the matrix stored as sparse
tmpdisfile2=paste0(tempdir(),"/RsparsefloatDis.bin")
CalcAndWriteDissimilarityMatrix(tmpfile2,tmpdisfile2,distype="L2",
restype="float",comment="L2 distance matrix from sparse",nthreads=0)
JMatInfo(tmpdisfile2)
# Read both versions
Dfu<-GetJManyRows(tmpdisfile1,c(1:nrow(Rf)))
Dsp<-GetJManyRows(tmpdisfile2,c(1:nrow(Rf)))
# and compare them
max(Dfu-Dsp)