report_term_matches {lingmatch}R Documentation

Generate a Report of Term Matches

Description

Extract matches to fuzzy terms (globs/wildcards or regular expressions) from provided text, in order to assess their appropriateness for inclusion in a dictionary.

Usage

report_term_matches(dict, text = NULL, space = NULL, glob = TRUE,
  parse_phrases = TRUE, tolower = TRUE, punct = TRUE, special = TRUE,
  as_terms = FALSE, bysentence = FALSE, as_string = TRUE,
  term_map_freq = 1, term_map_spaces = 1, outFile = NULL,
  space_dir = getOption("lingmatch.lspace.dir"), verbose = TRUE)

Arguments

dict

A vector of terms, list of such vectors, or a matrix-like object to be categorized by read.dic.

text

A vector of text to extract matches from. If not specified, will use the terms in the term_map retrieved from select.lspace.

space

A vector space used to calculate similarities between term matches. Name of a the space (see select.lspace), a matrix with terms as row names, or TRUE to auto-select a space based on matched terms.

glob

Logical; if TRUE, converts globs (asterisk wildcards) to regular expressions. If not specified, this will be set automatically.

parse_phrases

Logical; if TRUE (default) and space is specified, will break unmatched phrases into single terms, and average across and matched vectors.

tolower

Logical; if FALSE, will retain text's case.

punct

Logical; if FALSE, will remove punctuation markings in text.

special

Logical; if FALSE, will attempt to replace special characters in text.

as_terms

Logical; if TRUE, will treat text as terms, meaning dict terms will only count as matches when matching the complete text.

bysentence

Logical; if TRUE, will split text into sentences, and only consider unique sentences.

as_string

Logical; if FALSE, returns matches as tables rather than a string.

term_map_freq

Proportion of terms to include when using the term map as a source of terms. Applies when text is not specified.

term_map_spaces

Number of spaces in which a term has to appear to be included. Applies when text is not specified.

outFile

File path to write results to, always ending in .csv.

space_dir

Directory from which space should be loaded.

verbose

Logical; if FALSE, will not display status messages.

Value

A data.frame of results, with a row for each unique term, and the following columns:

Note

Matches are extracted for each term independently, so they may not align with some implementations of dictionaries. For instance, by default lma_patcat matches destructively, and sorts terms by length such that shorter terms will not match the same text and longer terms that overlap. Here, the match would show up for both terms.

See Also

For a more complete assessment of dictionaries, see dictionary_meta().

Similar information is provided in the dictionary builder web tool.

Other Dictionary functions: dictionary_meta(), download.dict(), lma_patcat(), lma_termcat(), read.dic(), select.dict()

Examples

text <- c(
  "I am sadly homeless, and suffering from depression :(",
  "This wholesome happiness brings joy to my heart! :D:D:D",
  "They are joyous in these fearsome happenings D:",
  "I feel weightless now that my sadness has been depressed! :()"
)
dict <- list(
  sad = c("*less", "sad*", "depres*", ":("),
  happy = c("*some", "happ*", "joy*", "d:"),
  self = c("i *", "my *")
)

report_term_matches(dict, text)

[Package lingmatch version 1.0.7 Index]