size {polmineR} | R Documentation |
Get Number of Tokens.
Description
The method will get the number of tokens in a corpus
, partition
or
subcorpus
, split up by an s-attribute if provided.
Usage
size(x, ...)
## S4 method for signature 'corpus'
size(x, s_attribute = NULL, verbose = TRUE, ...)
## S4 method for signature 'character'
size(x, s_attribute = NULL, verbose = TRUE, ...)
## S4 method for signature 'partition'
size(x, s_attribute = NULL, ...)
## S4 method for signature 'partition_bundle'
size(x)
## S4 method for signature 'DocumentTermMatrix'
size(x)
## S4 method for signature 'TermDocumentMatrix'
size(x)
## S4 method for signature 'features'
size(x)
## S4 method for signature 'remote_corpus'
size(x)
## S4 method for signature 'remote_partition'
size(x)
Arguments
x |
An object to get size(s) for. |
... |
Further arguments (used only for backwards compatibility). |
s_attribute |
A |
verbose |
A |
Details
One or more s-attributes can be provided to get the dispersion of tokens
across one or more dimensions. If more than one s_attribute
is provided and
the structure of s-attributes is nested, ordering attributes according to the
ascending tree structure is advised for performance reasons.
The size()
-method for features
objects will return a named list
with the size of the corpus of interest ("coi"), i.e. the number of tokens
in the window, and the reference corpus ("ref"), i.e. the number of tokens
that are not matched by the query and that are outside the window.
Value
If .Object
is a corpus (a corpus
object or specified by corpus
id), an integer
vector if argument s_attribute
is NULL
, a two-column
data.table
otherwise (first column is the s-attribute, second column:
"size"). If .Object
is a subcorpus_bundle
or a partition_bundle
, a
data.table
(with columns "name" and "size").
See Also
See dispersion
-method for counts of hits. The
hits
method calls the size
-method to get sizes of
subcorpora.
Examples
use("polmineR")
use(pkg = "RcppCWB", corpus = "REUTERS")
# for corpus object
corpus("REUTERS") %>% size()
corpus("REUTERS") %>% size(s_attribute = "id")
corpus("GERMAPARLMINI") %>% size(s_attribute = c("date", "party"))
# for corpus specified by ID
size("GERMAPARLMINI")
size("GERMAPARLMINI", s_attribute = "date")
size("GERMAPARLMINI", s_attribute = c("date", "party"))
# for partition object
P <- partition("GERMAPARLMINI", date = "2009-11-11")
size(P, s_attribute = "speaker")
size(P, s_attribute = "party")
size(P, s_attribute = c("speaker", "party"))
# for subcorpus
sc <- corpus("GERMAPARLMINI") %>% subset(date == "2009-11-11")
size(sc, s_attribute = "speaker")
size(sc, s_attribute = "party")
size(sc, s_attribute = c("speaker", "party"))
# for subcorpus_bundle
subcorpora <- corpus("GERMAPARLMINI") %>% split(s_attribute = "date")
size(subcorpora)