&-method |
Experimental and undocumented querying of syntax relationships |
as.data.frame.udpipe_connlu |
Convert the result of udpipe_annotate to a tidy data frame |
as.matrix.cooccurrence |
Convert the result of cooccurrence to a sparse matrix |
as_conllu |
Convert a data.frame to CONLL-U format |
as_cooccurrence |
Convert a matrix to a co-occurrence data.frame |
as_fasttext |
Combine labels and text as used in fasttext |
as_phrasemachine |
Convert Parts of Speech tags to one-letter tags which can be used to identify phrases based on regular expressions |
as_word2vec |
Convert a matrix of word vectors to word2vec format |
brussels_listings |
Brussels AirBnB address locations available at www.insideairbnb.com |
brussels_reviews |
Reviews of AirBnB customers on Brussels address locations available at www.insideairbnb.com |
brussels_reviews_anno |
Reviews of the AirBnB customers which are tokenised, POS tagged and lemmatised |
brussels_reviews_w2v_embeddings_lemma_nl |
An example matrix of word embeddings |
cbind_dependencies |
Add the dependency parsing information to an annotated dataset |
cbind_morphological |
Add morphological features to an annotated dataset |
collocation |
Extract collocations - a sequence of terms which follow each other |
cooccurrence |
Create a cooccurence data.frame |
cooccurrence.character |
Create a cooccurence data.frame |
cooccurrence.cooccurrence |
Create a cooccurence data.frame |
cooccurrence.data.frame |
Create a cooccurence data.frame |
document_term_frequencies |
Aggregate a data.frame to the document/term level by calculating how many times a term occurs per document |
document_term_frequencies.character |
Aggregate a data.frame to the document/term level by calculating how many times a term occurs per document |
document_term_frequencies.data.frame |
Aggregate a data.frame to the document/term level by calculating how many times a term occurs per document |
document_term_frequencies_statistics |
Add Term Frequency, Inverse Document Frequency and Okapi BM25 statistics to the output of document_term_frequencies |
document_term_matrix |
Create a document/term matrix |
document_term_matrix.data.frame |
Create a document/term matrix |
document_term_matrix.default |
Create a document/term matrix |
document_term_matrix.DocumentTermMatrix |
Create a document/term matrix |
document_term_matrix.integer |
Create a document/term matrix |
document_term_matrix.matrix |
Create a document/term matrix |
document_term_matrix.numeric |
Create a document/term matrix |
document_term_matrix.simple_triplet_matrix |
Create a document/term matrix |
document_term_matrix.TermDocumentMatrix |
Create a document/term matrix |
dtm_align |
Reorder a Document-Term-Matrix alongside a vector or data.frame |
dtm_bind |
Combine 2 document term matrices either by rows or by columns |
dtm_cbind |
Combine 2 document term matrices either by rows or by columns |
dtm_chisq |
Compare term usage across 2 document groups using the Chi-square Test for Count Data |
dtm_colsums |
Column sums and Row sums for document term matrices |
dtm_conform |
Make sure a document term matrix has exactly the specified rows and columns |
dtm_cor |
Pearson Correlation for Sparse Matrices |
dtm_rbind |
Combine 2 document term matrices either by rows or by columns |
dtm_remove_lowfreq |
Remove terms occurring with low frequency from a Document-Term-Matrix and documents with no terms |
dtm_remove_sparseterms |
Remove terms with high sparsity from a Document-Term-Matrix |
dtm_remove_terms |
Remove terms from a Document-Term-Matrix and keep only documents which have a least some terms |
dtm_remove_tfidf |
Remove terms from a Document-Term-Matrix and documents with no terms based on the term frequency inverse document frequency |
dtm_reverse |
Inverse operation of the document_term_matrix function |
dtm_rowsums |
Column sums and Row sums for document term matrices |
dtm_sample |
Random samples and permutations from a Document-Term-Matrix |
dtm_svd_similarity |
Semantic Similarity to a Singular Value Decomposition |
dtm_tfidf |
Term Frequency - Inverse Document Frequency calculation |
keywords_collocation |
Extract collocations - a sequence of terms which follow each other |
keywords_phrases |
Extract phrases - a sequence of terms which follow each other based on a sequence of Parts of Speech tags |
keywords_rake |
Keyword identification using Rapid Automatic Keyword Extraction (RAKE) |
paste.data.frame |
Concatenate text of each group of data together |
phrases |
Extract phrases - a sequence of terms which follow each other based on a sequence of Parts of Speech tags |
predict.LDA |
Predict method for an object of class LDA_VEM or class LDA_Gibbs |
predict.LDA_Gibbs |
Predict method for an object of class LDA_VEM or class LDA_Gibbs |
predict.LDA_VEM |
Predict method for an object of class LDA_VEM or class LDA_Gibbs |
strsplit.data.frame |
Obtain a tokenised data frame by splitting text alongside a regular expression |
syntaxpatterns |
Experimental and undocumented querying of syntax patterns |
syntaxpatterns-class |
Experimental and undocumented querying of syntax patterns |
syntaxrelation |
Experimental and undocumented querying of syntax relationships |
syntaxrelation-class |
Experimental and undocumented querying of syntax relationships |
txt_collapse |
Collapse a character vector while removing missing data. |
txt_contains |
Check if text contains a certain pattern |
txt_context |
Based on a vector with a word sequence, get n-grams (looking forward + backward) |
txt_count |
Count the number of times a pattern is occurring in text |
txt_freq |
Frequency statistics of elements in a vector |
txt_grepl |
Look up a multiple patterns and indicate their presence in text |
txt_highlight |
Highlight words in a character vector |
txt_next |
Get the n-th next element of a vector |
txt_nextgram |
Based on a vector with a word sequence, get n-grams (looking forward) |
txt_overlap |
Get the overlap between 2 vectors |
txt_paste |
Concatenate strings with options how to handle missing data |
txt_previous |
Get the n-th previous element of a vector |
txt_previousgram |
Based on a vector with a word sequence, get n-grams (looking backward) |
txt_recode |
Recode text to other categories |
txt_recode_ngram |
Recode words with compound multi-word expressions |
txt_sample |
Boilerplate function to sample one element from a vector. |
txt_sentiment |
Perform dictionary-based sentiment analysis on a tokenised data frame |
txt_show |
Boilerplate function to cat only 1 element of a character vector. |
txt_tagsequence |
Identify a contiguous sequence of tags as 1 being entity |
udpipe |
Tokenising, Lemmatising, Tagging and Dependency Parsing of raw text in TIF format |
udpipe_accuracy |
Evaluate the accuracy of your UDPipe model on holdout data |
udpipe_annotate |
Tokenising, Lemmatising, Tagging and Dependency Parsing Annotation of raw text |
udpipe_annotation_params |
List with training options set by the UDPipe community when building models based on the Universal Dependencies data |
udpipe_download_model |
Download an UDPipe model provided by the UDPipe community for a specific language of choice |
udpipe_load_model |
Load an UDPipe model |
udpipe_read_conllu |
Read in a CONLL-U file as a data.frame |
udpipe_train |
Train a UDPipe model |
unique_identifier |
Create a unique identifier for each combination of fields in a data frame |
unlist_tokens |
Create a data.frame from a list of tokens |
|-method |
Experimental and undocumented querying of syntax relationships |