trim {polmineR}R Documentation

Trim an object.

Description

Method to trim and adjust objects by applying thresholds, minimum frequencies etc. It can be applied to context, features, context, partition and partition_bundle objects.

Usage

trim(.Object, ...)

## S4 method for signature 'TermDocumentMatrix'
trim(
  .Object,
  terms_to_drop,
  docs_to_keep,
  min_count,
  min_doc_length,
  verbose = TRUE,
  ...
)

## S4 method for signature 'DocumentTermMatrix'
trim(
  .Object,
  terms_to_drop,
  docs_to_keep,
  min_count,
  min_doc_length,
  verbose = TRUE,
  ...
)

punctuation

Arguments

.Object

The object to be trimmed

...

further arguments

terms_to_drop

A character vector with terms to exclude from matrix (terms used as stopwords).

docs_to_keep

A character vector with documents to keep.

min_count

A numeric value with a minimum value of total term frequency across documents to exclude rare terms from matrix.

min_doc_length

A numeric value with minimum total of the summed-up occurrence of tokens in a document. Exclude documents below this value and filter out short documents. Note that the min_doc_length filter is applied before filtering for min_count and terms_to_keep, and that these filters will reduce document lengths.

verbose

A logical value, whether to output progress messages.

Format

An object of class character of length 13.

Author(s)

Andreas Blaette

Examples

use("RcppCWB", corpus = "REUTERS")
dtm <- corpus("REUTERS") %>%
  split(s_attribute = "id") %>%
  as.DocumentTermMatrix(p_attribute = "word", verbose = FALSE)
trim(dtm, min_doc_length = 100)

[Package polmineR version 0.8.9 Index]