tokenWindowOccurence {corpustools} | R Documentation |
This function returns the occurence of tokens (position.matrix) and the window of occurence (window.matrix). This format enables the co-occurence of tokens within sliding windows (i.e. token distance) to be calculated by multiplying position.matrix with window.matrix.
tokenWindowOccurence(
tc,
feature,
context_level = c("document", "sentence"),
window.size = 10,
direction = "<>",
distance_as_value = F,
batch_rows = NULL,
drop_empty_terms = T
)
tc |
a tCorpus object |
feature |
The name of the feature column |
context_level |
Select whether to use "document" or "sentence" as context boundaries |
window.size |
The distance within which tokens should occur from each other to be counted as a co-occurence. |
direction |
a string indicating whether only the left ('<') or right ('>') side of the window, or both ('<>'), should be used. |
distance_as_value |
If True, the values of the matrix will represent the shorts distance to the occurence of a feature |
batch_rows |
Used in functions that call this function in batches |
drop_empty_terms |
If TRUE, emtpy terms (with zero occurence) will be dropped |
A list with two matrices. position.mat gives the specific position of a term, and window.mat gives the window in which each token occured. The rows represent the position of a term, and matches the input of this function (position, term and context). The columns represents terms.