R: Bind lexrank scores to a dataframe of text

bind_lexrank_ {lexRankr}

R Documentation

Bind lexrank scores to a dataframe of text

Description

Bind lexrank scores to a dataframe of sentences or to a dataframe of tokens with sentence ids

Usage

bind_lexrank_(tbl, text, doc_id, sent_id = NULL, level = c("sentences",
  "tokens"), threshold = 0.2, usePageRank = TRUE, damping = 0.85,
  continuous = FALSE, ...)

bind_lexrank(tbl, text, doc_id, sent_id = NULL, level = c("sentences",
  "tokens"), threshold = 0.2, usePageRank = TRUE, damping = 0.85,
  continuous = FALSE, ...)

Arguments

`tbl`	dataframe containing column of sentences to be lexranked
`text`	name of column containing sentences or tokens to be lexranked
`doc_id`	name of column containing document ids corresponding to `text`
`sent_id`	Only needed if `level` is "tokens". name of column containing sentence ids corresponding to `text`
`level`	the parsed level of the text column to be lexranked. i.e. is `text` a column of "sentences" or "tokens"? The "tokens" level is provided to allow users to implement custom tokenization. Note: even if the input `level` is "tokens" lexrank scores are assigned at the sentence level.
`threshold`	The minimum simililarity value a sentence pair must have to be represented in the graph where lexRank is calculated.
`usePageRank`	`TRUE` or `FALSE` indicating whether or not to use the page rank algorithm for ranking sentences. If `FALSE`, a sentences unweighted centrality will be used as the rank. Defaults to `TRUE`.
`damping`	The damping factor to be passed to page rank algorithm. Ignored if `usePageRank` is `FALSE`.
`continuous`	`TRUE` or `FALSE` indicating whether or not to use continuous LexRank. Only applies if `usePageRank==TRUE`. If `TRUE`, `threshold` will be ignored and lexRank will be computed using a weighted graph representation of the sentences. Defaults to `FALSE`.
`...`	tokenizing options to be passed to lexRankr::tokenize. Ignored if `level` is "sentences"

Value

A dataframe with an additional column of lexrank scores (column is given name lexrank)

Examples


df <- data.frame(doc_id = 1:3, 
                 text = c("Testing the system. Second sentence for you.", 
                          "System testing the tidy documents df.", 
                          "Documents will be parsed and lexranked."),
                 stringsAsFactors = FALSE)

## Not run: 
library(magrittr)

df %>% 
  unnest_sentences(sents, text) %>% 
  bind_lexrank(sents, doc_id, level = "sentences")

df %>% 
  unnest_sentences(sents, text) %>% 
  bind_lexrank_("sents", "doc_id", level = "sentences")

df <- data.frame(doc_id  = c(1, 1, 1, 1, 1, 1, 1, 2, 2, 2,
                             2, 2, 2, 3, 3, 3, 3, 3, 3), 
                 sent_id = c(1, 1, 1, 2, 2, 2, 2, 1, 1, 1, 
                             1, 1, 1, 1, 1, 1, 1, 1, 1), 
                 tokens = c("testing", "the", "system", "second", 
                            "sentence", "for", "you", "system", 
                            "testing", "the", "tidy", "documents", 
                            "df", "documents", "will", "be", "parsed", 
                            "and", "lexranked"),
                 stringsAsFactors = FALSE)

df %>% 
  bind_lexrank(tokens, doc_id, sent_id, level = 'tokens')

## End(Not run)

[Package lexRankr version 0.5.2 Index]