readLdac {stm} | R Documentation |
Read in a .ldac Formatted File
Description
Read in a term document matrix in the .ldac sparse matrix format popularized by David Blei's C code implementation of lda.
Usage
readLdac(filename)
Arguments
filename |
An input file or filepath to be processed |
Details
ldac
expects a file name or path that contains a file in Blei's LDA-C
format. From his ReadMe: "The data is a file where each line is of the form:
[M] [term_1]:[count] [term_2]:[count] ... [term_N]:[count]
where [M] is the number of unique terms in the document, and the [count] associated with each term is how many times that term appeared in the document. Note that [term_1] is an integer which indexes the term; it is not a string."
Because R indexes from one, the values of the term indices are incremented by one on import.
Value
documents |
A documents object in our format |
See Also
textProcessor
, prepDocuments
readCorpus