R: Subcorpus With Word Filter

filterWord {tosca}

R Documentation

Subcorpus With Word Filter

Description

Generates a subcorpus by restricting it to texts containing specific filter words.

Usage

filterWord(...)

## Default S3 method:
filterWord(
  text,
  search,
  ignore.case = FALSE,
  out = c("text", "bin", "count"),
  ...
)

## S3 method for class 'textmeta'
filterWord(
  object,
  search,
  ignore.case = FALSE,
  out = c("text", "bin", "count"),
  filtermeta = TRUE,
  ...
)

Arguments

`...`	Not used.
`text`	Not necessary if `object` is specified, else should be `object$text`: list of article texts.
`search`	List of data frames. Every List element is an 'or' link, every entry in a data frame is linked by an 'and'. The dataframe must have following tree variables: `pattern` a character string including the search terms, `word`, a logical value displaying if a word (TRUE) or character (search) is wanted and `count` an integer marking how many times the word must at least be found in the text. `word` can alternatively be a character string containing the keywords `pattern` for character search, `word` for word-search and `left` and `right` for truncated search. If `search` is only a character Vector the link is 'or', and a character search will be used with `count=1`
`ignore.case`	Logical: Lower and upper case will be ignored.
`out`	Type of output: `text` filtered corpus, `bin` logical vector for all texts, `count` the number of matches.
`object`	A `textmeta` object
`filtermeta`	Logical: Should the meta component be filtered, too?

Value

textmeta object if object is specified, else only the filtered text. If a textmeta object is returned its meta data are filtered to those texts which appear in the corpus by default (filtermeta).

Examples

texts <- list(A="Give a Man a Fish, and You Feed Him for a Day.
Teach a Man To Fish, and You Feed Him for a Lifetime",
B="So Long, and Thanks for All the Fish",
C="A very able manipulative mathematician, Fisher enjoys a real mastery
in evaluating complicated multiple integrals.")

# search for pattern "fish"
filterWord(text=texts, search="fish", ignore.case=TRUE)

# search for word "fish"
filterWord(text=texts, search=data.frame(pattern="fish", word="word", count=1),
ignore.case=TRUE)

# pattern must appear at least two times
filterWord(text=texts, search=data.frame(pattern="fish", word="pattern", count=2),
ignore.case=TRUE)

# search for "fish" AND "day"
filterWord(text=texts, search=data.frame(pattern=c("fish", "day"), word="word", count=1),
ignore.case=TRUE)

# search for "Thanks" OR "integrals"
filterWord(text=texts, search=list(data.frame(pattern="Thanks", word="word", count=1),
data.frame(pattern="integrals", word="word", count=1)))

[Package tosca version 0.3-2 Index]