textreuse-package |
textreuse: Detect Text Reuse and Document Similarity |
align_local |
Local alignment of natural language texts |
as.matrix.textreuse_candidates |
Convert candidates data frames to other formats |
filenames |
Filenames from paths |
hashes |
Accessors for TextReuse objects |
hashes<- |
Accessors for TextReuse objects |
hash_string |
Hash a string to an integer |
has_content |
TextReuseTextDocument |
has_hashes |
TextReuseTextDocument |
has_minhashes |
TextReuseTextDocument |
has_tokens |
TextReuseTextDocument |
is.TextReuseCorpus |
TextReuseCorpus |
is.TextReuseTextDocument |
TextReuseTextDocument |
jaccard_bag_similarity |
Measure similarity/dissimilarity in documents |
jaccard_dissimilarity |
Measure similarity/dissimilarity in documents |
jaccard_similarity |
Measure similarity/dissimilarity in documents |
lsh |
Locality sensitive hashing for minhash |
lsh_candidates |
Candidate pairs from LSH comparisons |
lsh_compare |
Compare candidates identified by LSH |
lsh_probability |
Probability that a candidate pair will be detected with LSH |
lsh_query |
Query a LSH cache for matches to a single document |
lsh_subset |
List of all candidates in a corpus |
lsh_threshold |
Probability that a candidate pair will be detected with LSH |
minhashes |
Accessors for TextReuse objects |
minhashes<- |
Accessors for TextReuse objects |
minhash_generator |
Generate a minhash function |
pairwise_candidates |
Candidate pairs from pairwise comparisons |
pairwise_compare |
Pairwise comparisons among documents in a corpus |
ratio_of_matches |
Measure similarity/dissimilarity in documents |
rehash |
Recompute the hashes for a document or corpus |
similarity-functions |
Measure similarity/dissimilarity in documents |
skipped |
TextReuseCorpus |
textreuse |
textreuse: Detect Text Reuse and Document Similarity |
TextReuseCorpus |
TextReuseCorpus |
TextReuseTextDocument |
TextReuseTextDocument |
TextReuseTextDocument-accessors |
Accessors for TextReuse objects |
tokenize |
Recompute the tokens for a document or corpus |
tokenizers |
Split texts into tokens |
tokenize_ngrams |
Split texts into tokens |
tokenize_sentences |
Split texts into tokens |
tokenize_skip_ngrams |
Split texts into tokens |
tokenize_words |
Split texts into tokens |
tokens |
Accessors for TextReuse objects |
tokens<- |
Accessors for TextReuse objects |
wordcount |
Count words |