tCorpus$merge {corpustools}R Documentation

Merge the token and meta data.tables of a tCorpus with another data.frame

Description

Add columns to token/meta by merging with a data.frame df. Only possible for unique matches (i.e. the columns specified in by are unique in df)

Arguments

df

A data.frame (can be regular, data.table or tibble)

by

The columns to match on. Must exist in both tokens/meta and df. If the columns in tokens/meta and df have different names, use by.x and by.y

by.x

The names of the columns used in tokens/meta

by.y

The names of the columns used in df

columns

Optionally, specify which specific columns from df to merge to tokens

Details

Usage:

## R6 method for class tCorpus. Use as tc$method (where tc is a tCorpus object).

merge(df, by, by.x, by.y)
merge_meta(df, by, by.x, by.y)

Examples

d = data.frame(text = c('This is an example. Best example ever.', 'oh my god', 'so good'),
               id = c('a','b','c'),
               source  =c('aa','bb','cc'))
tc = create_tcorpus(d, doc_col='id', split_sentences = TRUE)

df = data.frame(doc_id=c('a','b'), test=c('A','B'))
tc$merge(df, by='doc_id')
tc$tokens

df = data.frame(doc_id=c('a','b'), sentence=1, test2=c('A','B'))
tc$merge(df, by=c('doc_id', 'sentence'))
tc$tokens

df = data.frame(doc_id=c('a','b'), sentence=1, token_id=c(3,4), test3=c('A','B'))
tc$merge(df, by=c('doc_id', 'sentence', 'token_id'))
tc$tokens

meta = data.frame(doc_id=c('a','b'), test=c('A','B'))
tc$merge_meta(meta, by='doc_id')
tc$meta

meta = data.frame(source=c('aa'), test2=c('A'))
tc$merge_meta(meta, by='source')
tc$meta

[Package corpustools version 0.5.1 Index]