tCorpus$feature_subset {corpustools} R Documentation ## Filter features ### Description Similar to using tCorpus$subset, but instead of deleting rows it only sets rows for a specified feature to NA. This can be very convenient, because it enables only a selection of features to be used in an analysis (e.g. a topic model) but maintaining the context of the full article, so that results can be viewed in this context (e.g. a topic browser).

Just as in subset, it is easy to use objects and functions in the filter, including the special functions for using term frequency statistics (see documentation for tCorpus$subset). Usage: ## R6 method for class tCorpus. Use as tc$method (where tc is a tCorpus object).

feature_subset(column, new_column, subset)

### Arguments

 column the column containing the feature to be used as the input subset logical expression indicating rows to keep in the tokens data. i.e. rows for which the logical expression is FALSE will be set to NA. new_column the column to save the filtered feature. Can be a new column or overwrite an existing one. min_freq an integer, specifying minimum token frequency. min_docfreq an integer, specifying minimum document frequency. max_freq an integer, specifying minimum token frequency. max_docfreq an integer, specifying minimum document frequency. min_char an integer, specifying minimum characters in a token max_char an integer, specifying maximum characters in a token

### Examples

tc = create_tcorpus('a a a a b b b c c')

tc$feature_subset('token', 'tokens_subset1', subset = token_id < 5) tc$feature_subset('token', 'tokens_subset2', subset = freq_filter(token, min = 3))

tc\$tokens


[Package corpustools version 0.4.10 Index]