bind_lexrank_ {lexRankr} | R Documentation |
Bind lexrank scores to a dataframe of text
Description
Bind lexrank scores to a dataframe of sentences or to a dataframe of tokens with sentence ids
Usage
bind_lexrank_(tbl, text, doc_id, sent_id = NULL, level = c("sentences",
"tokens"), threshold = 0.2, usePageRank = TRUE, damping = 0.85,
continuous = FALSE, ...)
bind_lexrank(tbl, text, doc_id, sent_id = NULL, level = c("sentences",
"tokens"), threshold = 0.2, usePageRank = TRUE, damping = 0.85,
continuous = FALSE, ...)
Arguments
tbl |
dataframe containing column of sentences to be lexranked |
text |
name of column containing sentences or tokens to be lexranked |
doc_id |
name of column containing document ids corresponding to |
sent_id |
Only needed if |
level |
the parsed level of the text column to be lexranked. i.e. is |
threshold |
The minimum simililarity value a sentence pair must have to be represented in the graph where lexRank is calculated. |
usePageRank |
|
damping |
The damping factor to be passed to page rank algorithm. Ignored if |
continuous |
|
... |
tokenizing options to be passed to lexRankr::tokenize. Ignored if |
Value
A dataframe with an additional column of lexrank scores (column is given name lexrank)
Examples
df <- data.frame(doc_id = 1:3,
text = c("Testing the system. Second sentence for you.",
"System testing the tidy documents df.",
"Documents will be parsed and lexranked."),
stringsAsFactors = FALSE)
## Not run:
library(magrittr)
df %>%
unnest_sentences(sents, text) %>%
bind_lexrank(sents, doc_id, level = "sentences")
df %>%
unnest_sentences(sents, text) %>%
bind_lexrank_("sents", "doc_id", level = "sentences")
df <- data.frame(doc_id = c(1, 1, 1, 1, 1, 1, 1, 2, 2, 2,
2, 2, 2, 3, 3, 3, 3, 3, 3),
sent_id = c(1, 1, 1, 2, 2, 2, 2, 1, 1, 1,
1, 1, 1, 1, 1, 1, 1, 1, 1),
tokens = c("testing", "the", "system", "second",
"sentence", "for", "you", "system",
"testing", "the", "tidy", "documents",
"df", "documents", "will", "be", "parsed",
"and", "lexranked"),
stringsAsFactors = FALSE)
df %>%
bind_lexrank(tokens, doc_id, sent_id, level = 'tokens')
## End(Not run)