R: Process Text

lma_process {lingmatch}

R Documentation

Process Text

Description

A wrapper to other pre-processing functions, potentially from read.segments, to lma_dtm or lma_patcat, to lma_weight, then lma_termcat or lma_lspace, and optionally including lma_meta output.

Usage

lma_process(input = NULL, ..., meta = TRUE, coverage = FALSE)

Arguments

`input`	A vector of text, or path to a text file or folder.
`...`	arguments to be passed to `lma_dtm`, `lma_patcat`, `lma_weight`, `lma_termcat`, and/or `lma_lspace`. All arguments must be named.
`meta`	Logical; if `FALSE`, metastatistics are not included. Only applies when raw text is available. If included, meta categories are added as the last columns, with names starting with "meta_".
`coverage`	Logical; if `TRUE` and a dictionary is provided (`dict`), will calculate the coverage (number of unique term matches) of each dictionary category.

Value

A matrix with texts represented by rows, and features in columns, unless there are multiple rows per output (e.g., when a latent semantic space is applied without terms being mapped) in which case only the special output is returned (e.g., a matrix with terms as rows and latent dimensions in columns).

Examples

# starting with some texts in a vector
texts <- c(
  "Firstly, I would like to say, and with all due respect...",
  "Please, proceed. I hope you feel you can speak freely...",
  "Oh, of course, I just hope to be clear, and not cause offense...",
  "Oh, no, don't monitor yourself on my account..."
)

# by default, term counts and metastatistics are returned
lma_process(texts)

# add dictionary and percent arguments for standard dictionary-based results
lma_process(texts, dict = lma_dict(), percent = TRUE)

# add space and weight arguments for standard word-centroid vectors
lma_process(texts, space = lma_lspace(texts), weight = "tfidf")