tokenWindowOccurence {corpustools}R Documentation

Gives the window in which a term occured in a matrix.

Description

This function returns the occurence of tokens (position.matrix) and the window of occurence (window.matrix). This format enables the co-occurence of tokens within sliding windows (i.e. token distance) to be calculated by multiplying position.matrix with window.matrix.

Usage

tokenWindowOccurence(
  tc,
  feature,
  context_level = c("document", "sentence"),
  window.size = 10,
  direction = "<>",
  distance_as_value = F,
  batch_rows = NULL,
  drop_empty_terms = T
)

Arguments

tc

a tCorpus object

feature

The name of the feature column

context_level

Select whether to use "document" or "sentence" as context boundaries

window.size

The distance within which tokens should occur from each other to be counted as a co-occurence.

direction

a string indicating whether only the left ('<') or right ('>') side of the window, or both ('<>'), should be used.

distance_as_value

If True, the values of the matrix will represent the shorts distance to the occurence of a feature

batch_rows

Used in functions that call this function in batches

drop_empty_terms

If TRUE, emtpy terms (with zero occurence) will be dropped

Value

A list with two matrices. position.mat gives the specific position of a term, and window.mat gives the window in which each token occured. The rows represent the position of a term, and matches the input of this function (position, term and context). The columns represents terms.


[Package corpustools version 0.4.10 Index]