Managing, Querying and Analyzing Tokenized Text

Documentation for package ‘corpustools’ version 0.5.1

Help Pages

-- A --

add_multitoken_label	Choose and add multitoken strings based on multitoken categories
aggregate_rsyntax	Aggregate rsyntax annotations
agg_label	Helper function for aggregate_rsyntax
agg_tcorpus	Aggregate the tokens data
annotate_rsyntax	Annotate tokens based on rsyntax queries
as.tcorpus	Force an object to be a tCorpus class
as.tcorpus.default	Force an object to be a tCorpus class
as.tcorpus.tCorpus	Force an object to be a tCorpus class

-- B --

backbone_filter	Extract the backbone of a network.
browse_hits	View hits in a browser
browse_texts	Create and view a full text browser

-- C --

calc_chi2	Vectorized computation of chi^2 statistic for a 2x2 crosstab containing the values [a, b] [c, d]
code_dictionary	Dictionary lookup
code_features	Code features in a tCorpus based on a search string
compare_corpus	Compare tCorpus vocabulary to that of another (reference) tCorpus
compare_documents	Calculate the similarity of documents
compare_subset	Compare vocabulary of a subset of a tCorpus to the rest of the tCorpus
context	Get a context vector
corenlp_tokens	coreNLP example sentences
count_tcorpus	Count results of search hits, or of a given feature in tokens
create_tcorpus	Create a tCorpus
create_tcorpus.character	Create a tCorpus
create_tcorpus.corpus	Create a tCorpus
create_tcorpus.data.frame	Create a tCorpus
create_tcorpus.factor	Create a tCorpus

-- D --

deduplicate	Deduplicate documents
delete_columns	Delete column from the data and meta data
delete_meta_columns	Delete column from the data and meta data
docfreq_filter	Support function for subset method
dtm_compare	Compare two document term matrices
dtm_wordcloud	Plot a word cloud from a dtm

-- E --

ego_semnet	Create an ego network
export_span_annotations	Export span annotations

-- F --

feats_to_columms	Cast the "feats" column in UDpipe tokens to columns
feature_associations	Get common nearby features given a query or query hits
feature_stats	Feature statistics
feature_subset	Filter features
fold_rsyntax	Fold rsyntax annotations
freq_filter	Support function for subset method

-- G --

get	Access the data from a tCorpus
get_dfm	Create a document term matrix.
get_dtm	Create a document term matrix.
get_global_i	Compute global feature positions
get_kwic	Get keyword-in-context (KWIC) strings
get_meta	Access the data from a tCorpus
get_stopwords	Get a character vector of stopwords

-- L --

laplace	Laplace (i.e. add constant) smoothing
lda_fit	Estimate a LDA topic model

-- M --

melt_quanteda_dict	Convert a quanteda dictionary to a long data.table format
merge	Merge the token and meta data.tables of a tCorpus with another data.frame
merge_meta	Merge the token and meta data.tables of a tCorpus with another data.frame
merge_tcorpora	Merge tCorpus objects

-- P --

plot.contextHits	S3 plot for contextHits class
plot.featureAssociations	visualize feature associations
plot.featureHits	S3 plot for featureHits class
plot.vocabularyComparison	visualize vocabularyComparison
plot_semnet	Visualize a semnet network
plot_words	Plot a wordcloud with words ordered and coloured according to a dimension (x)
preprocess	Preprocess feature
preprocess_tokens	Preprocess tokens in a character vector
print.contextHits	S3 print for contextHits class
print.featureHits	S3 print for featureHits class
print.tCorpus	S3 print for tCorpus class

-- R --

refresh_tcorpus	Refresh a tCorpus object using the current version of corpustools
replace_dictionary	Replace tokens with dictionary match
require_package	Check if package with given version exists

-- S --

search_contexts	Search for documents or sentences using Boolean queries
search_dictionary	Dictionary lookup
search_features	Find tokens using a Lucene-like search query
search_recode	Recode features in a tCorpus based on a search string
semnet	Create a semantic network based on the co-occurence of tokens in documents
semnet_window	Create a semantic network based on the co-occurence of tokens in token windows
set	Modify the token and meta data.tables of a tCorpus
set_levels	Change levels of factor columns
set_meta	Modify the token and meta data.tables of a tCorpus
set_meta_levels	Change levels of factor columns
set_meta_name	Change column names of data and meta data
set_name	Change column names of data and meta data
set_network_attributes	Set some default network attributes for pretty plotting
sgt	Simple Good Turing smoothing
show_udpipe_models	Show the names of udpipe models
sotu_texts	State of the Union addresses
stopwords_list	Basic stopword lists
subset	Subset a tCorpus
subset.tCorpus	S3 subset for tCorpus class
subset_meta	Subset a tCorpus
subset_query	Subset tCorpus token data using a query
summary.contextHits	S3 summary for contextHits class
summary.featureHits	S3 summary for featureHits class
summary.tCorpus	Summary of a tCorpus object

-- T --

tCorpus	tCorpus: a corpus class for tokenized texts
tcorpus	tCorpus: a corpus class for tokenized texts
tCorpus$annotate_rsyntax	Annotate tokens based on rsyntax queries
tCorpus$code_dictionary	Dictionary lookup
tCorpus$code_features	Code features in a tCorpus based on a search string
tCorpus$context	Get a context vector
tCorpus$deduplicate	Deduplicate documents
tCorpus$delete_columns	Delete column from the data and meta data
tCorpus$delete_meta_columns	Delete column from the data and meta data
tCorpus$feats_to_columns	Cast the "feats" column in UDpipe tokens to columns
tCorpus$feature_subset	Filter features
tCorpus$fold_rsyntax	Fold rsyntax annotations
tCorpus$get	Access the data from a tCorpus
tCorpus$get_meta	Access the data from a tCorpus
tCorpus$lda_fit	Estimate a LDA topic model
tCorpus$merge	Merge the token and meta data.tables of a tCorpus with another data.frame
tCorpus$preprocess	Preprocess feature
tCorpus$replace_dictionary	Replace tokens with dictionary match
tCorpus$search_recode	Recode features in a tCorpus based on a search string
tCorpus$set	Modify the token and meta data.tables of a tCorpus
tCorpus$set_levels	Change levels of factor columns
tCorpus$set_meta	Modify the token and meta data.tables of a tCorpus
tCorpus$set_meta_levels	Change levels of factor columns
tCorpus$set_meta_name	Change column names of data and meta data
tCorpus$set_name	Change column names of data and meta data
tCorpus$subset	Subset a tCorpus
tCorpus$subset_meta	Subset a tCorpus
tCorpus$subset_query	Subset tCorpus token data using a query
tCorpus$udpipe_clauses	Add columns indicating who did what
tCorpus$udpipe_quotes	Add columns indicating who said what
tCorpus_compare	Corpus comparison
tCorpus_create	Creating a tCorpus
tCorpus_data	Methods and functions for viewing, modifying and subsetting tCorpus data
tCorpus_docsim	Document similarity
tCorpus_features	Preprocessing, subsetting and analyzing features
tCorpus_modify_by_reference	Modify tCorpus by reference
tCorpus_querying	Use Boolean queries to analyze the tCorpus
tCorpus_semnet	Feature co-occurrence based semantic network analysis
tCorpus_topmod	Topic modeling
tc_plot_tree	Visualize a dependency tree
tc_sotu_udpipe	A tCorpus with a small sample of sotu paragraphs parsed with udpipe
tokens_to_tcorpus	Create a tcorpus based on tokens (i.e. preprocessed texts)
tokenWindowOccurence	Gives the window in which a term occured in a matrix.
top_features	Show top features
transform_rsyntax	Apply rsyntax transformations

-- U --

udpipe_clauses	Add columns indicating who did what
udpipe_clause_tqueries	Get a list of tqueries for extracting who did what
udpipe_quotes	Add columns indicating who said what
udpipe_quote_tqueries	Get a list of tqueries for extracting quotes
udpipe_simplify	Simplify tokenIndex created with the udpipe parser
udpipe_spanquote_tqueries	Get a list of tqueries for finding candidates for span quotes.
udpipe_tcorpus	Create a tCorpus using udpipe
udpipe_tcorpus.character	Create a tCorpus using udpipe
udpipe_tcorpus.corpus	Create a tCorpus using udpipe
udpipe_tcorpus.data.frame	Create a tCorpus using udpipe
udpipe_tcorpus.factor	Create a tCorpus using udpipe
untokenize	Reconstruct original texts