R: Find similar stories to a given story

get_similar_stories {stoRy}

R Documentation

Find similar stories to a given story

Description

get_similar_stories calculates the top n most thematically similar stories to a given story.

Usage

get_similar_stories(
  query_story,
  background_collection = NULL,
  top_n = 10,
  weights = list(choice = 3, major = 2, minor = 1),
  explicit = TRUE,
  min_freq = 1,
  blacklist = NULL,
  metric = c("hgt", "tfidf")
)

Arguments

`query_story`	A `Story()` class object defining a story of interest. Thematically similar stories to this one will be returned.
`background_collection`	A `Collection()` class object the stories in which to search for similar stories to the `query_story` input story. If `NULL`, the collection of all stories in the actively loaded LTO version is used.
`top_n`	Maximum number of similar stories to report. The default is `top_n=10`. If `Inf`, all stories in the background collection are reported.
`weights`	A list assigning nonnegative weights to choice, major, and minor theme levels. The default weighting `list(choice = 3, major = 2, minor = 1)` counts each choice usage three times, each major theme usage twice, and each minor theme usage once. Use the uniform weighting `list(choice = 1, major = 1, minor = 1)` weights theme usages equally regardless of level. At least one weight must be positive.
`explicit`	Set to `FALSE` to include ancestor themes of the explicit thematic annotations.
`min_freq`	Drop themes occurring less than this number of times from the analysis. The default `min_freq=1` results in no themes are discarded.
`blacklist`	A `Themeset()` class object. A themeset containing themes to be dropped from the analysis. If `NULL`, no themes are filtered.
`metric`	A character vector specifying the choice of weighting to use in the cosine similarity measure used to evaluate story thematic similarity. Use `metric = "hgt"` for hypergeometric test P-value weights and `metric = "tfidf"` for TF-IDF weights. The default specification of `metric = c("hgt", "tfidf")` results in hypergeometric test P-values being used as weights.

Value

Returns a tibble with top_n rows (stories) and 5 columns:

`story_id`:	`n`-th most thematically similar story to the query story
`title`:	Reference story title
`description`:	Reference story description
`score`:	Cosine similarity score with hypergeometric test weights (if `metric = "hgt"`) or TF-IDF weights (if `metric = "tfidf"`).
`common_themes`:	List of themes common to both the query and reference story

References

Paul Sheridan, Mikael Onsjö, Claudia Becerra, Sergio Jimenez, Georg Dueñas (2019). An Ontology-Based Recommender System with an Application to the Star Trek Television Franchise. Future Internet, 11(9), 182. DOI: doi:10.3390/fi11090182

Examples

## Not run: 
# Retrieve the top 10 most similar stories to the classic "The Twilight
# Zone" series episode "Nightmare at 20,000 Feet" (1959):
set_lto("demo")
query_story <- Story$new(story_id = "tz1959e5x03")
result_tbl <- get_similar_stories(query_story)
result_tbl

# Retrieve the top 10 most similar stories to the classic "The Twilight 
# Zone" series episode "Nightmare at 20,000 Feet" (1959) without taking
# minor themes into account:
set_lto("demo")
query_story <- Story$new(story_id = "tz1959e5x03")
result_tbl <- get_similar_stories(query_story, weights = list(choice = 3, major = 2, minor = 0))
result_tbl

# Retrieve the top 10 most similar stories to the classic "The Twilight 
# Zone" series episode "Nightmare at 20,000 Feet" (1959) when implicitly
# featured themes are included in the similarity calculation:
set_lto("demo")
query_story <- Story$new(story_id = "tz1959e5x03")
result_tbl <- get_similar_stories(query_story, explicit = FALSE)
result_tbl

## End(Not run)

[Package stoRy version 0.2.2 Index]