subset-method {polmineR} | R Documentation |
Subsetting corpora and subcorpora
Description
The structural attributes of a corpus (s-attributes) can be used
to generate subcorpora (i.e. a subcorpus
class object) by applying the
subset
-method. To obtain a subcorpus
, the subset
-method can be
applied on a corpus represented by a corpus
object, a length-one
character
vector (as a shortcut), and on a subcorpus
object.
Usage
## S4 method for signature 'corpus'
subset(x, subset, regex = FALSE, verbose = FALSE, ...)
## S4 method for signature 'character'
subset(x, ...)
## S4 method for signature 'subcorpus'
subset(x, subset, verbose = FALSE, ...)
## S4 method for signature 'remote_corpus'
subset(x, subset)
## S4 method for signature 'subcorpus_bundle'
subset(x, ..., iterate = FALSE, verbose = TRUE, progress = FALSE, mc = NULL)
Arguments
x |
A |
subset |
A |
regex |
A |
verbose |
A |
... |
An expression that will be used to create a subcorpus from s-attributes. |
iterate |
A |
progress |
A |
mc |
An |
Details
The default approach for subsetting a subcorpus_bundle
is to
temporarily merge objects into a single subcorpus
, perform subset()
,
and restore subcorpus_bundle
by splitting on the s-attribute of the input
subcorpus_bundle
. This approach may have unintended results, if x
has
been generated using complex criteria. This may be the case for instance,
if x
resulted from as.speeches()
. In this scenario, set argument
iterate
to TRUE
to iterate over objects in bundle one-by-one.
Value
A subcorpus
object. If the expression provided by argument subset
includes undefined s-attributes, a warning is issued and the return value
is NULL
.
See Also
The methods applicable for the subcorpus
object resulting from
subsetting a corpus or subcorpus are described in the documentation of the
\link{subcorpus-class}
. Note that the subset
-method can also be applied
to textstat-class
objects (and objects inheriting from this
class).
Examples
use("polmineR")
# examples for standard and non-standard evaluation
a <- corpus("GERMAPARLMINI")
# subsetting a corpus object using non-standard evaluation
sc <- subset(a, speaker == "Angela Dorothea Merkel")
sc <- subset(a, speaker == "Angela Dorothea Merkel" & date == "2009-10-28")
sc <- subset(a, grepl("Merkel", speaker))
sc <- subset(a, grepl("Merkel", speaker) & date == "2009-10-28")
# subsetting corpus specified by character vector
sc <- subset("GERMAPARLMINI", grepl("Merkel", speaker))
sc <- subset("GERMAPARLMINI", speaker == "Angela Dorothea Merkel")
sc <- subset("GERMAPARLMINI", speaker == "Angela Dorothea Merkel" & date == "2009-10-28")
sc <- subset("GERMAPARLMINI", grepl("Merkel", speaker) & date == "2009-10-28")
# subsetting a corpus using the (old) logic of the partition-method
sc <- subset(a, speaker = "Angela Dorothea Merkel")
sc <- subset(a, speaker = "Angela Dorothea Merkel", date = "2009-10-28")
sc <- subset(a, speaker = "Merkel", regex = TRUE)
sc <- subset(a, speaker = c("Merkel", "Kauder"), regex = TRUE)
sc <- subset(a, speaker = "Merkel", date = "2009-10-28", regex = TRUE)
# providing the value for s-attribute as a variable
who <- "Volker Kauder"
sc <- subset(a, quote(speaker == !!who))
# quoting and quosures necessary when programming against subset
# note how variable who needs to be handled
gparl <- corpus("GERMAPARLMINI")
subcorpora <- lapply(
c("Angela Dorothea Merkel", "Volker Kauder", "Ronald Pofalla"),
function(who) subset(gparl, speaker == !!who)
)
# subset a subcorpus_bundle
merkel <- corpus("GERMAPARLMINI") %>%
split(s_attribute = "protocol_date") %>%
subset(speaker == "Angela Dorothea Merkel")
# iterate over objects in bundle one by one
sp <- corpus("GERMAPARLMINI") %>%
as.speeches(
s_attribute_name = "speaker",
s_attribute_date = "protocol_date",
progress = FALSE
) %>%
subset(interjection == "speech", iterate = TRUE, progress = FALSE)