Tokenization, Parts of Speech Tagging, Lemmatization and Dependency Parsing with the 'UDPipe' 'NLP' Toolkit


[Up] [Top]

Documentation for package ‘udpipe’ version 0.8.11

Help Pages

&-method Experimental and undocumented querying of syntax relationships
as.data.frame.udpipe_connlu Convert the result of udpipe_annotate to a tidy data frame
as.matrix.cooccurrence Convert the result of cooccurrence to a sparse matrix
as_conllu Convert a data.frame to CONLL-U format
as_cooccurrence Convert a matrix to a co-occurrence data.frame
as_fasttext Combine labels and text as used in fasttext
as_phrasemachine Convert Parts of Speech tags to one-letter tags which can be used to identify phrases based on regular expressions
as_word2vec Convert a matrix of word vectors to word2vec format
brussels_listings Brussels AirBnB address locations available at www.insideairbnb.com
brussels_reviews Reviews of AirBnB customers on Brussels address locations available at www.insideairbnb.com
brussels_reviews_anno Reviews of the AirBnB customers which are tokenised, POS tagged and lemmatised
brussels_reviews_w2v_embeddings_lemma_nl An example matrix of word embeddings
cbind_dependencies Add the dependency parsing information to an annotated dataset
cbind_morphological Add morphological features to an annotated dataset
collocation Extract collocations - a sequence of terms which follow each other
cooccurrence Create a cooccurence data.frame
cooccurrence.character Create a cooccurence data.frame
cooccurrence.cooccurrence Create a cooccurence data.frame
cooccurrence.data.frame Create a cooccurence data.frame
document_term_frequencies Aggregate a data.frame to the document/term level by calculating how many times a term occurs per document
document_term_frequencies.character Aggregate a data.frame to the document/term level by calculating how many times a term occurs per document
document_term_frequencies.data.frame Aggregate a data.frame to the document/term level by calculating how many times a term occurs per document
document_term_frequencies_statistics Add Term Frequency, Inverse Document Frequency and Okapi BM25 statistics to the output of document_term_frequencies
document_term_matrix Create a document/term matrix
document_term_matrix.data.frame Create a document/term matrix
document_term_matrix.default Create a document/term matrix
document_term_matrix.DocumentTermMatrix Create a document/term matrix
document_term_matrix.integer Create a document/term matrix
document_term_matrix.matrix Create a document/term matrix
document_term_matrix.numeric Create a document/term matrix
document_term_matrix.simple_triplet_matrix Create a document/term matrix
document_term_matrix.TermDocumentMatrix Create a document/term matrix
dtm_align Reorder a Document-Term-Matrix alongside a vector or data.frame
dtm_bind Combine 2 document term matrices either by rows or by columns
dtm_cbind Combine 2 document term matrices either by rows or by columns
dtm_chisq Compare term usage across 2 document groups using the Chi-square Test for Count Data
dtm_colsums Column sums and Row sums for document term matrices
dtm_conform Make sure a document term matrix has exactly the specified rows and columns
dtm_cor Pearson Correlation for Sparse Matrices
dtm_rbind Combine 2 document term matrices either by rows or by columns
dtm_remove_lowfreq Remove terms occurring with low frequency from a Document-Term-Matrix and documents with no terms
dtm_remove_sparseterms Remove terms with high sparsity from a Document-Term-Matrix
dtm_remove_terms Remove terms from a Document-Term-Matrix and keep only documents which have a least some terms
dtm_remove_tfidf Remove terms from a Document-Term-Matrix and documents with no terms based on the term frequency inverse document frequency
dtm_reverse Inverse operation of the document_term_matrix function
dtm_rowsums Column sums and Row sums for document term matrices
dtm_sample Random samples and permutations from a Document-Term-Matrix
dtm_svd_similarity Semantic Similarity to a Singular Value Decomposition
dtm_tfidf Term Frequency - Inverse Document Frequency calculation
keywords_collocation Extract collocations - a sequence of terms which follow each other
keywords_phrases Extract phrases - a sequence of terms which follow each other based on a sequence of Parts of Speech tags
keywords_rake Keyword identification using Rapid Automatic Keyword Extraction (RAKE)
paste.data.frame Concatenate text of each group of data together
phrases Extract phrases - a sequence of terms which follow each other based on a sequence of Parts of Speech tags
predict.LDA Predict method for an object of class LDA_VEM or class LDA_Gibbs
predict.LDA_Gibbs Predict method for an object of class LDA_VEM or class LDA_Gibbs
predict.LDA_VEM Predict method for an object of class LDA_VEM or class LDA_Gibbs
strsplit.data.frame Obtain a tokenised data frame by splitting text alongside a regular expression
syntaxpatterns Experimental and undocumented querying of syntax patterns
syntaxpatterns-class Experimental and undocumented querying of syntax patterns
syntaxrelation Experimental and undocumented querying of syntax relationships
syntaxrelation-class Experimental and undocumented querying of syntax relationships
txt_collapse Collapse a character vector while removing missing data.
txt_contains Check if text contains a certain pattern
txt_context Based on a vector with a word sequence, get n-grams (looking forward + backward)
txt_count Count the number of times a pattern is occurring in text
txt_freq Frequency statistics of elements in a vector
txt_grepl Look up a multiple patterns and indicate their presence in text
txt_highlight Highlight words in a character vector
txt_next Get the n-th next element of a vector
txt_nextgram Based on a vector with a word sequence, get n-grams (looking forward)
txt_overlap Get the overlap between 2 vectors
txt_paste Concatenate strings with options how to handle missing data
txt_previous Get the n-th previous element of a vector
txt_previousgram Based on a vector with a word sequence, get n-grams (looking backward)
txt_recode Recode text to other categories
txt_recode_ngram Recode words with compound multi-word expressions
txt_sample Boilerplate function to sample one element from a vector.
txt_sentiment Perform dictionary-based sentiment analysis on a tokenised data frame
txt_show Boilerplate function to cat only 1 element of a character vector.
txt_tagsequence Identify a contiguous sequence of tags as 1 being entity
udpipe Tokenising, Lemmatising, Tagging and Dependency Parsing of raw text in TIF format
udpipe_accuracy Evaluate the accuracy of your UDPipe model on holdout data
udpipe_annotate Tokenising, Lemmatising, Tagging and Dependency Parsing Annotation of raw text
udpipe_annotation_params List with training options set by the UDPipe community when building models based on the Universal Dependencies data
udpipe_download_model Download an UDPipe model provided by the UDPipe community for a specific language of choice
udpipe_load_model Load an UDPipe model
udpipe_read_conllu Read in a CONLL-U file as a data.frame
udpipe_train Train a UDPipe model
unique_identifier Create a unique identifier for each combination of fields in a data frame
unlist_tokens Create a data.frame from a list of tokens
|-method Experimental and undocumented querying of syntax relationships