get_story_clusters {stoRy} | R Documentation |
Find clusters of similar stories
Description
get_story_clusters
classifies the stories in a collection according to
thematic similarity.
Usage
get_story_clusters(
collection = NULL,
weights = list(choice = 3, major = 2, minor = 1),
explicit = TRUE,
min_freq = 1,
min_size = 3,
blacklist = NULL
)
Arguments
collection |
A If |
weights |
A list assigning nonnegative weights to choice, major, and
minor theme levels. The default weighting
|
explicit |
Set to |
min_freq |
Drop themes occurring less than this number of times from
the analysis. The default |
min_size |
Minimum cluster size. The default is |
blacklist |
A If |
Details
The input collection of n
stories, S[1], \ldots, S[n]
, is
represented as a weighted bag-of-words, where each choice theme in
story S[j] (j=1, \ldots, n)
is counted weights$choice
times,
each major theme weights$major
times, and each minor
theme weights$choice
times.
The function classifies the stories according to thematic similarity
using the Iterative Signature Algorithm (ISA) biclustering algorithm as
implemented in the isa2
R package. The clusters are "soft" meaning
that a story can appear in multiple clusters.
Install isa2
package by running the command
install.packages(\"isa2\")
before calling this function.
Value
Returns a tibble
with r
rows (story
clusters) and 4 columns:
cluster_id : | Story cluster integer ID |
stories : | A tibble of stories comprising the cluster |
themes : | A tibble of themes common to the clustered stories |
size : | Number of stories in the cluster |
References
Gábor Csárdi, Zoltán Kutalik, Sven Bergmann (2010). Modular analysis of gene expression data with R. Bioinformatics, 26, 1376-7.
Sven Bergmann, Jan Ihmels, Naama Barkai (2003). Iterative signature algorithm for the analysis of large-scale gene expression data. Physical Review E, 67, 031902.
Gábor Csárdi (2017). isa2: The Iterative Signature Algorithm. R package version 0.3.5. https://cran.r-project.org/package=isa2
Examples
## Not run:
# Cluster "The Twilight Zone" franchise stories according to thematic
# similarity:
library(dplyr)
set_lto("demo")
set.seed(123)
result_tbl <- get_story_clusters()
result_tbl
# Explore a cluster of stories related to traveling back in time:
cluster_id <- 3
pull(result_tbl, stories)[[cluster_id]]
pull(result_tbl, themes)[[cluster_id]]
# Explore a cluster of stories related to mass panics:
cluster_id <- 5
pull(result_tbl, stories)[[cluster_id]]
pull(result_tbl, themes)[[cluster_id]]
# Explore a cluster of stories related to executions:
cluster_id <- 7
pull(result_tbl, stories)[[cluster_id]]
pull(result_tbl, themes)[[cluster_id]]
# Explore a cluster of stories related to space aliens:
cluster_id <- 10
pull(result_tbl, stories)[[cluster_id]]
pull(result_tbl, themes)[[cluster_id]]
# Explore a cluster of stories related to old people wanting to be young:
cluster_id <- 11
pull(result_tbl, stories)[[cluster_id]]
pull(result_tbl, themes)[[cluster_id]]
# Explore a cluster of stories related to wish making:
cluster_id <- 13
pull(result_tbl, stories)[[cluster_id]]
pull(result_tbl, themes)[[cluster_id]]
## End(Not run)