Managing, Querying and Analyzing Tokenized Text


[Up] [Top]

Documentation for package ‘corpustools’ version 0.5.1

Help Pages

A B C D E F G L M P R S T U

-- A --

add_multitoken_label Choose and add multitoken strings based on multitoken categories
aggregate_rsyntax Aggregate rsyntax annotations
agg_label Helper function for aggregate_rsyntax
agg_tcorpus Aggregate the tokens data
annotate_rsyntax Annotate tokens based on rsyntax queries
as.tcorpus Force an object to be a tCorpus class
as.tcorpus.default Force an object to be a tCorpus class
as.tcorpus.tCorpus Force an object to be a tCorpus class

-- B --

backbone_filter Extract the backbone of a network.
browse_hits View hits in a browser
browse_texts Create and view a full text browser

-- C --

calc_chi2 Vectorized computation of chi^2 statistic for a 2x2 crosstab containing the values [a, b] [c, d]
code_dictionary Dictionary lookup
code_features Code features in a tCorpus based on a search string
compare_corpus Compare tCorpus vocabulary to that of another (reference) tCorpus
compare_documents Calculate the similarity of documents
compare_subset Compare vocabulary of a subset of a tCorpus to the rest of the tCorpus
context Get a context vector
corenlp_tokens coreNLP example sentences
count_tcorpus Count results of search hits, or of a given feature in tokens
create_tcorpus Create a tCorpus
create_tcorpus.character Create a tCorpus
create_tcorpus.corpus Create a tCorpus
create_tcorpus.data.frame Create a tCorpus
create_tcorpus.factor Create a tCorpus

-- D --

deduplicate Deduplicate documents
delete_columns Delete column from the data and meta data
delete_meta_columns Delete column from the data and meta data
docfreq_filter Support function for subset method
dtm_compare Compare two document term matrices
dtm_wordcloud Plot a word cloud from a dtm

-- E --

ego_semnet Create an ego network
export_span_annotations Export span annotations

-- F --

feats_to_columms Cast the "feats" column in UDpipe tokens to columns
feature_associations Get common nearby features given a query or query hits
feature_stats Feature statistics
feature_subset Filter features
fold_rsyntax Fold rsyntax annotations
freq_filter Support function for subset method

-- G --

get Access the data from a tCorpus
get_dfm Create a document term matrix.
get_dtm Create a document term matrix.
get_global_i Compute global feature positions
get_kwic Get keyword-in-context (KWIC) strings
get_meta Access the data from a tCorpus
get_stopwords Get a character vector of stopwords

-- L --

laplace Laplace (i.e. add constant) smoothing
lda_fit Estimate a LDA topic model

-- M --

melt_quanteda_dict Convert a quanteda dictionary to a long data.table format
merge Merge the token and meta data.tables of a tCorpus with another data.frame
merge_meta Merge the token and meta data.tables of a tCorpus with another data.frame
merge_tcorpora Merge tCorpus objects

-- P --

plot.contextHits S3 plot for contextHits class
plot.featureAssociations visualize feature associations
plot.featureHits S3 plot for featureHits class
plot.vocabularyComparison visualize vocabularyComparison
plot_semnet Visualize a semnet network
plot_words Plot a wordcloud with words ordered and coloured according to a dimension (x)
preprocess Preprocess feature
preprocess_tokens Preprocess tokens in a character vector
print.contextHits S3 print for contextHits class
print.featureHits S3 print for featureHits class
print.tCorpus S3 print for tCorpus class

-- R --

refresh_tcorpus Refresh a tCorpus object using the current version of corpustools
replace_dictionary Replace tokens with dictionary match
require_package Check if package with given version exists

-- S --

search_contexts Search for documents or sentences using Boolean queries
search_dictionary Dictionary lookup
search_features Find tokens using a Lucene-like search query
search_recode Recode features in a tCorpus based on a search string
semnet Create a semantic network based on the co-occurence of tokens in documents
semnet_window Create a semantic network based on the co-occurence of tokens in token windows
set Modify the token and meta data.tables of a tCorpus
set_levels Change levels of factor columns
set_meta Modify the token and meta data.tables of a tCorpus
set_meta_levels Change levels of factor columns
set_meta_name Change column names of data and meta data
set_name Change column names of data and meta data
set_network_attributes Set some default network attributes for pretty plotting
sgt Simple Good Turing smoothing
show_udpipe_models Show the names of udpipe models
sotu_texts State of the Union addresses
stopwords_list Basic stopword lists
subset Subset a tCorpus
subset.tCorpus S3 subset for tCorpus class
subset_meta Subset a tCorpus
subset_query Subset tCorpus token data using a query
summary.contextHits S3 summary for contextHits class
summary.featureHits S3 summary for featureHits class
summary.tCorpus Summary of a tCorpus object

-- T --

tCorpus tCorpus: a corpus class for tokenized texts
tcorpus tCorpus: a corpus class for tokenized texts
tCorpus$annotate_rsyntax Annotate tokens based on rsyntax queries
tCorpus$code_dictionary Dictionary lookup
tCorpus$code_features Code features in a tCorpus based on a search string
tCorpus$context Get a context vector
tCorpus$deduplicate Deduplicate documents
tCorpus$delete_columns Delete column from the data and meta data
tCorpus$delete_meta_columns Delete column from the data and meta data
tCorpus$feats_to_columns Cast the "feats" column in UDpipe tokens to columns
tCorpus$feature_subset Filter features
tCorpus$fold_rsyntax Fold rsyntax annotations
tCorpus$get Access the data from a tCorpus
tCorpus$get_meta Access the data from a tCorpus
tCorpus$lda_fit Estimate a LDA topic model
tCorpus$merge Merge the token and meta data.tables of a tCorpus with another data.frame
tCorpus$preprocess Preprocess feature
tCorpus$replace_dictionary Replace tokens with dictionary match
tCorpus$search_recode Recode features in a tCorpus based on a search string
tCorpus$set Modify the token and meta data.tables of a tCorpus
tCorpus$set_levels Change levels of factor columns
tCorpus$set_meta Modify the token and meta data.tables of a tCorpus
tCorpus$set_meta_levels Change levels of factor columns
tCorpus$set_meta_name Change column names of data and meta data
tCorpus$set_name Change column names of data and meta data
tCorpus$subset Subset a tCorpus
tCorpus$subset_meta Subset a tCorpus
tCorpus$subset_query Subset tCorpus token data using a query
tCorpus$udpipe_clauses Add columns indicating who did what
tCorpus$udpipe_quotes Add columns indicating who said what
tCorpus_compare Corpus comparison
tCorpus_create Creating a tCorpus
tCorpus_data Methods and functions for viewing, modifying and subsetting tCorpus data
tCorpus_docsim Document similarity
tCorpus_features Preprocessing, subsetting and analyzing features
tCorpus_modify_by_reference Modify tCorpus by reference
tCorpus_querying Use Boolean queries to analyze the tCorpus
tCorpus_semnet Feature co-occurrence based semantic network analysis
tCorpus_topmod Topic modeling
tc_plot_tree Visualize a dependency tree
tc_sotu_udpipe A tCorpus with a small sample of sotu paragraphs parsed with udpipe
tokens_to_tcorpus Create a tcorpus based on tokens (i.e. preprocessed texts)
tokenWindowOccurence Gives the window in which a term occured in a matrix.
top_features Show top features
transform_rsyntax Apply rsyntax transformations

-- U --

udpipe_clauses Add columns indicating who did what
udpipe_clause_tqueries Get a list of tqueries for extracting who did what
udpipe_quotes Add columns indicating who said what
udpipe_quote_tqueries Get a list of tqueries for extracting quotes
udpipe_simplify Simplify tokenIndex created with the udpipe parser
udpipe_spanquote_tqueries Get a list of tqueries for finding candidates for span quotes.
udpipe_tcorpus Create a tCorpus using udpipe
udpipe_tcorpus.character Create a tCorpus using udpipe
udpipe_tcorpus.corpus Create a tCorpus using udpipe
udpipe_tcorpus.data.frame Create a tCorpus using udpipe
udpipe_tcorpus.factor Create a tCorpus using udpipe
untokenize Reconstruct original texts