get_similar_stories {stoRy}R Documentation

Find similar stories to a given story

Description

[Maturing]

get_similar_stories calculates the top n most thematically similar stories to a given story.

Usage

get_similar_stories(
  query_story,
  background_collection = NULL,
  top_n = 10,
  weights = list(choice = 3, major = 2, minor = 1),
  explicit = TRUE,
  min_freq = 1,
  blacklist = NULL,
  metric = c("hgt", "tfidf")
)

Arguments

query_story

A Story() class object defining a story of interest. Thematically similar stories to this one will be returned.

background_collection

A Collection() class object the stories in which to search for similar stories to the query_story input story.

If NULL, the collection of all stories in the actively loaded LTO version is used.

top_n

Maximum number of similar stories to report. The default is top_n=10.

If Inf, all stories in the background collection are reported.

weights

A list assigning nonnegative weights to choice, major, and minor theme levels. The default weighting list(choice = 3, major = 2, minor = 1) counts each choice usage three times, each major theme usage twice, and each minor theme usage once. Use the uniform weighting list(choice = 1, major = 1, minor = 1) weights theme usages equally regardless of level. At least one weight must be positive.

explicit

Set to FALSE to include ancestor themes of the explicit thematic annotations.

min_freq

Drop themes occurring less than this number of times from the analysis. The default min_freq=1 results in no themes are discarded.

blacklist

A Themeset() class object. A themeset containing themes to be dropped from the analysis.

If NULL, no themes are filtered.

metric

A character vector specifying the choice of weighting to use in the cosine similarity measure used to evaluate story thematic similarity. Use metric = "hgt" for hypergeometric test P-value weights and metric = "tfidf" for TF-IDF weights.

The default specification of metric = c("hgt", "tfidf") results in hypergeometric test P-values being used as weights.

Value

Returns a tibble with top_n rows (stories) and 5 columns:

story_id: n-th most thematically similar story to the query story
title: Reference story title
description: Reference story description
score: Cosine similarity score with hypergeometric test weights (if metric = "hgt") or TF-IDF weights (if metric = "tfidf").
common_themes: List of themes common to both the query and reference story

References

Paul Sheridan, Mikael Onsjö, Claudia Becerra, Sergio Jimenez, Georg Dueñas (2019). An Ontology-Based Recommender System with an Application to the Star Trek Television Franchise. Future Internet, 11(9), 182. DOI: doi:10.3390/fi11090182

Examples

## Not run: 
# Retrieve the top 10 most similar stories to the classic "The Twilight
# Zone" series episode "Nightmare at 20,000 Feet" (1959):
set_lto("demo")
query_story <- Story$new(story_id = "tz1959e5x03")
result_tbl <- get_similar_stories(query_story)
result_tbl

# Retrieve the top 10 most similar stories to the classic "The Twilight 
# Zone" series episode "Nightmare at 20,000 Feet" (1959) without taking
# minor themes into account:
set_lto("demo")
query_story <- Story$new(story_id = "tz1959e5x03")
result_tbl <- get_similar_stories(query_story, weights = list(choice = 3, major = 2, minor = 0))
result_tbl

# Retrieve the top 10 most similar stories to the classic "The Twilight 
# Zone" series episode "Nightmare at 20,000 Feet" (1959) when implicitly
# featured themes are included in the similarity calculation:
set_lto("demo")
query_story <- Story$new(story_id = "tz1959e5x03")
result_tbl <- get_similar_stories(query_story, explicit = FALSE)
result_tbl

## End(Not run)

[Package stoRy version 0.2.2 Index]