tCorpus_data {corpustools}R Documentation

Methods and functions for viewing, modifying and subsetting tCorpus data

Description

(back to overview)

Details

Get data

$get() Get (by default deep copy) token data, with the possibility to select columns and subset. Instead of copying you can also access the token data with tc$tokens
$get_meta() Get meta data, with the possibility to select columns and subset. Like tokens, you can also access meta data with tc$meta
get_dtm() Create a document term matrix
get_dfm() Create a document term matrix, using the Quanteda dfm format
$context() Get a context vector. Currently supports documents or globally unique sentences.

Modify

The token and meta data can be modified with the set* and delete* methods. All modifications are performed by reference.

$set() Modify the token data by setting the values of one (existing or new) column.
$set_meta() The set method for the document meta data
$set_levels() Change the levels of factor columns.
$set_meta_levels() Change the levels of factor columns in the meta data
$set_name() Modify column names of token data.
$set_meta_name() Delete columns in the meta data
$delete_columns() Delete columns.
$delete_meta_columns() Delete columns in the meta data

Modifying is restricted in certain ways to ensure that the data always meets the assumptions required for tCorpus methods. tCorpus automatically tests whether assumptions are violated, so you don't have to think about this yourself. The most important limitations are that you cannot subset or append the data. For subsetting, you can use the tCorpus$subset method, and to add data to a tcorpus you can use the merge_tcorpora function.

Subsetting, merging/adding

subset() Modify the token and/or meta data using the subset function. A subset expression can be specified for both the token data (subset) and the document meta data (subset_meta).
subset_query() Subset the tCorpus based on a query, as used in search_contexts
$subset() Like subset, but as an R6 method that changes the tCorpus by reference
$subset_query() Like subset_query, but as an R6 method that changes the tCorpus by reference

Fields

For the sake of convenience, the number of rows and column names of the data and meta data.tables can be accessed directly.

$n The number of tokens (i.e. rows in the data)
$n_meta The number of documents (i.e. rows in the document meta data)
$names The names of the token data columns
$names_meta The names of the document meta data columns

[Package corpustools version 0.4.10 Index]