textreuse-package | textreuse: Detect Text Reuse and Document Similarity |
align_local | Local alignment of natural language texts |
as.matrix.textreuse_candidates | Convert candidates data frames to other formats |
filenames | Filenames from paths |
hashes | Accessors for TextReuse objects |
hashes<- | Accessors for TextReuse objects |
hash_string | Hash a string to an integer |
has_content | TextReuseTextDocument |
has_hashes | TextReuseTextDocument |
has_minhashes | TextReuseTextDocument |
has_tokens | TextReuseTextDocument |
is.TextReuseCorpus | TextReuseCorpus |
is.TextReuseTextDocument | TextReuseTextDocument |
jaccard_bag_similarity | Measure similarity/dissimilarity in documents |
jaccard_dissimilarity | Measure similarity/dissimilarity in documents |
jaccard_similarity | Measure similarity/dissimilarity in documents |
lsh | Locality sensitive hashing for minhash |
lsh_candidates | Candidate pairs from LSH comparisons |
lsh_compare | Compare candidates identified by LSH |
lsh_probability | Probability that a candidate pair will be detected with LSH |
lsh_query | Query a LSH cache for matches to a single document |
lsh_subset | List of all candidates in a corpus |
lsh_threshold | Probability that a candidate pair will be detected with LSH |
minhashes | Accessors for TextReuse objects |
minhashes<- | Accessors for TextReuse objects |
minhash_generator | Generate a minhash function |
pairwise_candidates | Candidate pairs from pairwise comparisons |
pairwise_compare | Pairwise comparisons among documents in a corpus |
ratio_of_matches | Measure similarity/dissimilarity in documents |
rehash | Recompute the hashes for a document or corpus |
similarity-functions | Measure similarity/dissimilarity in documents |
skipped | TextReuseCorpus |
textreuse | textreuse: Detect Text Reuse and Document Similarity |
TextReuseCorpus | TextReuseCorpus |
TextReuseTextDocument | TextReuseTextDocument |
TextReuseTextDocument-accessors | Accessors for TextReuse objects |
tokenize | Recompute the tokens for a document or corpus |
tokenizers | Split texts into tokens |
tokenize_ngrams | Split texts into tokens |
tokenize_sentences | Split texts into tokens |
tokenize_skip_ngrams | Split texts into tokens |
tokenize_words | Split texts into tokens |
tokens | Accessors for TextReuse objects |
tokens<- | Accessors for TextReuse objects |
wordcount | Count words |