bm_25 {superml} | R Documentation |
BM25 Matching
Description
BM25 stands for Best Matching 25. It is widely using for ranking documents and a preferred method than TF*IDF scores. It is used to find the similar documents from a corpus, given a new document. It is popularly used in information retrieval systems. This implementation is based on c++ functions hence quite optimised as well.
Usage
bm_25(document, corpus, top_n)
Arguments
document |
a string for which to find similar documents |
corpus |
a vector of strings against which document is to be matched |
top_n |
top n similar documents to find |
Value
a vector containing similar documents and their scores
Examples
docs <- c("chimpanzees are found in jungle",
"chimps are jungle animals",
"Mercedes automobiles are best",
"merc is made in germany",
"chimps are intelligent animals")
sentence <- "automobiles are"
s <- bm_25(document=sentence, corpus=docs, top_n=2)
[Package superml version 0.5.7 Index]