| create_ifl {keyperm} | R Documentation | 
Create an Indexed Frequency List
Description
The keyperm package stores frequency lists in a special data structure called indexed frequency list. This can currently be created from a tdm object as implemented in the tm package.
Indexed frequency lists are essentially frequency lists stored in a three-column format,
similar to the simple triplet matrix internally used by tm to store term-document-matrices.
The first column stores number of document i, second number of term j and the third the 
frequencies with which the term j occurs in document i. Zero occurences are omitted. 
All columns contain integers, and the frequency list is sorted by document.  
The object returned is of class indexed_frequency_list. In addition to the actual frequency 
list it contains an index for fast access as well as pre-computed total number of tokens per
document and total occurences per term.
Usage
create_ifl(
  tdm,
  subset_terms = 1:dim(tdm)[1],
  subset_docs = 1:dim(tdm)[2],
  corpus
)
Arguments
| tdm | a tdm-matrix from the tm package. Currently, this is the only supported input, but others may be added in later versions. | 
| subset_terms | vector of terms to be considered. Can be integer (indices) or boolean. Terms not included still are counted for total number of token per document. | 
| subset_docs | vector of documents to be considered. Can be integer (indices) or boolean. Documents excluded do not contribute to total number of occurences of a term. | 
| corpus | vector indicating which documents belong to corpus A (first corpus). Can be integer (indices) or boolean. Currently, only comparisons of two corpora are supported. | 
Value
A list with class indexed_frequency_list containing the following components: