as.speeches {polmineR} | R Documentation |
Split corpus or partition into speeches.
Description
Split entire corpus or a partition into speeches. The heuristic is to split
the corpus/partition into partitions on day-to-day basis first, using the
s-attribute provided by s_attribute_date
. These subcorpora are then
splitted into speeches by speaker name, using s-attribute s_attribute_name
.
If there is a gap larger than the number of tokens supplied by argument
gap
, contributions of a speaker are assumed to be two seperate speeches.
Usage
as.speeches(.Object, ...)
## S4 method for signature 'partition'
as.speeches(
.Object,
s_attribute_date = grep("date", s_attributes(.Object), value = TRUE),
s_attribute_name = grep("name", s_attributes(.Object), value = TRUE),
gap = 500,
mc = FALSE,
verbose = TRUE,
progress = TRUE
)
## S4 method for signature 'subcorpus'
as.speeches(
.Object,
s_attribute_date = grep("date", s_attributes(.Object), value = TRUE),
s_attribute_name = grep("name", s_attributes(.Object), value = TRUE),
gap = 500,
mc = FALSE,
verbose = TRUE,
progress = TRUE
)
## S4 method for signature 'corpus'
as.speeches(
.Object,
s_attribute_date = grep("date", s_attributes(.Object), value = TRUE),
s_attribute_name = grep("name", s_attributes(.Object), value = TRUE),
gap = 500,
subset,
mc = FALSE,
verbose = TRUE,
progress = TRUE
)
## S4 method for signature 'character'
as.speeches(
.Object,
s_attribute_date = grep("date", s_attributes(.Object), value = TRUE),
s_attribute_name = grep("name", s_attributes(.Object), value = TRUE),
gap = 500,
mc = FALSE,
verbose = TRUE,
progress = TRUE
)
Arguments
.Object |
A |
... |
Further arguments. |
s_attribute_date |
A length-one |
s_attribute_name |
A length-one |
gap |
An |
mc |
Whether to use multicore, defaults to |
verbose |
A |
progress |
A |
subset |
A |
Value
A partition_bundle
, the names of the objects in the bundle are
the speaker name, the date of the speech and an index for the number of the
speech on a given day, concatenated by underscores.
Examples
## Not run:
use("polmineR")
speeches <- as.speeches(
"GERMAPARLMINI",
s_attribute_date = "date", s_attribute_name = "speaker"
)
speeches_count <- count(speeches, p_attribute = "word")
tdm <- as.TermDocumentMatrix(speeches_count, col = "count")
bt <- partition("GERMAPARLMINI", date = "2009-10-27")
speeches <- as.speeches(
bt,
s_attribute_name = "speaker",
s_attribute_date = "date"
)
summary(speeches)
## End(Not run)
## Not run:
#' sp <- corpus("GERMAPARLMINI") %>%
as.speeches(s_attribute_name = "speaker", s_attribute_date = "date")
sp <- corpus("GERMAPARLMINI") %>%
as.speeches(
s_attribute_name = "speaker",
s_attribute_date = "date",
subset = {date == as.Date("2009-11-11")},
progress = FALSE
)
sp <- corpus("GERMAPARLMINI") %>%
as.speeches(
s_attribute_name = "speaker",
s_attribute_date = "date",
subset = {date == "2009-11-10" & grepl("Merkel", speaker)},
progress = FALSE
)
## End(Not run)