| subcorpus {polmineR} | R Documentation |
The S4 subcorpus class.
Description
Class to manage subcorpora derived from a CWB corpus.
Usage
## S4 method for signature 'subcorpus'
summary(object)
## S4 replacement method for signature 'subcorpus'
name(x) <- value
## S4 method for signature 'subcorpus'
get_corpus(x)
## S4 method for signature 'subcorpus'
size(x, s_attribute = NULL, ...)
Arguments
object |
A |
x |
A |
value |
A |
s_attribute |
A |
... |
Arguments passed into |
Methods (by generic)
-
summary(subcorpus): Get named list with basic information forsubcorpusobject. -
name(subcorpus) <- value: Assign name to asubcorpusobject. -
get_corpus(subcorpus): Get the corpus ID from thesubcorpusobject. -
size(subcorpus): Get the size of asubcorpusobject from the respective slot of the object.
Slots
s_attributesA named
listwith the structural attributes defining the subcorpus.cposA
matrixwith left and right corpus positions defining regions (two column matrix withintegervalues).annotationsObject of class
list.sizeTotal size (number of tokens) of the
subcorpusobject (a length-oneintegervector). The value is accessible by calling thesize-method on thesubcorpus-object (see examples).metadataObject of class
data.frame, metadata information.strucsObject of class
integer, the strucs defining the subcorpus.xmlObject of class
character, whether the xml is "flat" or "nested".s_attribute_strucsObject of class
character, the base node.userIf the corpus on the server requires authentication, the username.
passwordIf the corpus on the server requires authentication, the password.
See Also
Most commonly, a subcorpus is derived from a corpus or
a subcorpus using the subset method. See
size for detailed documentation on how to use the
size-method. The subcorpus class shares many features with
the partition class, but it is more parsimonious and does not
include information on statistical properties of the subcorpus (i.e. a
count table). In line with this logic, the subcorpus class inherits
from the corpus class, whereas the partition class inherits
from the textstat class.
Other classes to manage corpora:
corpus-class,
phrases-class,
ranges-class,
regions
Examples
use("polmineR")
# basic example
r <- corpus("REUTERS")
k <- subset(r, grepl("kuwait", places))
name(k) <- "kuwait"
y <- summary(k)
s <- size(k)
# the same with a magrittr pipe
corpus("REUTERS") %>%
subset(grepl("kuwait", places)) %>%
summary()
# subsetting a subcorpus in a pipe
stone <- corpus("GERMAPARLMINI") %>%
subset(date == "2009-11-10") %>%
subset(speaker == "Frank-Walter Steinmeier")
# perform count for subcorpus
n <- corpus("REUTERS") %>% subset(grep("kuwait", places)) %>% count(p_attribute = "word")
n <- corpus("REUTERS") %>% subset(grep("saudi-arabia", places)) %>% count('"Saudi" "Arabia"')
# keyword-in-context analysis (kwic)
k <- corpus("REUTERS") %>% subset(grep("kuwait", places)) %>% kwic("oil")