Detect Text Reuse and Document Similarity


[Up] [Top]

Documentation for package ‘textreuse’ version 0.1.5

Help Pages

textreuse-package textreuse: Detect Text Reuse and Document Similarity
align_local Local alignment of natural language texts
as.matrix.textreuse_candidates Convert candidates data frames to other formats
filenames Filenames from paths
hashes Accessors for TextReuse objects
hashes<- Accessors for TextReuse objects
hash_string Hash a string to an integer
has_content TextReuseTextDocument
has_hashes TextReuseTextDocument
has_minhashes TextReuseTextDocument
has_tokens TextReuseTextDocument
is.TextReuseCorpus TextReuseCorpus
is.TextReuseTextDocument TextReuseTextDocument
jaccard_bag_similarity Measure similarity/dissimilarity in documents
jaccard_dissimilarity Measure similarity/dissimilarity in documents
jaccard_similarity Measure similarity/dissimilarity in documents
lsh Locality sensitive hashing for minhash
lsh_candidates Candidate pairs from LSH comparisons
lsh_compare Compare candidates identified by LSH
lsh_probability Probability that a candidate pair will be detected with LSH
lsh_query Query a LSH cache for matches to a single document
lsh_subset List of all candidates in a corpus
lsh_threshold Probability that a candidate pair will be detected with LSH
minhashes Accessors for TextReuse objects
minhashes<- Accessors for TextReuse objects
minhash_generator Generate a minhash function
pairwise_candidates Candidate pairs from pairwise comparisons
pairwise_compare Pairwise comparisons among documents in a corpus
ratio_of_matches Measure similarity/dissimilarity in documents
rehash Recompute the hashes for a document or corpus
similarity-functions Measure similarity/dissimilarity in documents
skipped TextReuseCorpus
textreuse textreuse: Detect Text Reuse and Document Similarity
TextReuseCorpus TextReuseCorpus
TextReuseTextDocument TextReuseTextDocument
TextReuseTextDocument-accessors Accessors for TextReuse objects
tokenize Recompute the tokens for a document or corpus
tokenizers Split texts into tokens
tokenize_ngrams Split texts into tokens
tokenize_sentences Split texts into tokens
tokenize_skip_ngrams Split texts into tokens
tokenize_words Split texts into tokens
tokens Accessors for TextReuse objects
tokens<- Accessors for TextReuse objects
wordcount Count words