R: Compute distance between sentences

sentenceSimil {lexRankr}

R Documentation

Compute distance between sentences

Description

Compute distance between sentences using modified idf cosine distance from "LexRank: Graph-based Lexical Centrality as Salience in Text Summarization". Output can be used as input to lexRankFromSimil.

Usage

sentenceSimil(sentenceId, token, docId = NULL, sentencesAsDocs = FALSE)

Arguments

`sentenceId`	A character vector of sentence IDs corresponding to the `docId` and `token` arguments
`token`	A character vector of tokens corresponding to the `docId` and `sentenceId` arguments
`docId`	A character vector of document IDs corresponding to the `sentenceId` and `token` arguments. Can be `NULL` if `sentencesAsDocs` is `TRUE`.
`sentencesAsDocs`	`TRUE` or `FALSE`, indicating whether or not to treat sentences as documents when calculating tfidf scores. If `TRUE`, inverse document frequency will be calculated as inverse sentence frequency (useful for single document extractive summarization)

Value

A 3 column dataframe of pairwise distances between sentences. Columns: sent1 (sentence id), sent2 (sentence id), & dist (distance between sent1 and sent2).

References

http://www.cs.cmu.edu/afs/cs/project/jair/pub/volume22/erkan04a-html/erkan04a.html

Examples

sentenceSimil(docId=c("d1","d1","d2","d2"),
               sentenceId=c("d1_1","d1_1","d2_1","d2_1"),
               token=c("i", "ran", "jane", "ran"))

[Package lexRankr version 0.5.2 Index]