textstat-class {polmineR} | R Documentation |
S4 textstat superclass.
Description
The textstat
S4 class is the superclass for the classes features
,
context
, and partition
. Usually, these subclasses, which are designed to
serve a specified analytical purpose, will be used . Common standard generic
methods such as head
, tail
, dim
, nrow
, colnames
are defined for the
textstat
class and are available for subclasses by inheritence. The core of
textstat
and its childs is a data.table
in the slot stat
for keeping
data on text statistics of a corpus
, or a partition
. The textstat
class
inherits from the corpus
class, keeping information on the corpus
available.
Usage
## S4 method for signature 'textstat'
name(x)
## S4 method for signature 'character'
name(x)
## S4 replacement method for signature 'textstat'
name(x) <- value
## S4 method for signature 'textstat'
round(x, digits = 2L)
## S4 method for signature 'textstat'
sort(x, by, decreasing = TRUE)
as.bundle(object, ...)
## S4 method for signature 'textstat,textstat'
e1 + e2
## S4 method for signature 'textstat'
subset(x, subset)
## S3 method for class 'textstat'
as.data.table(x, ...)
## S4 method for signature 'textstat'
show(object)
## S4 method for signature 'textstat'
p_attributes(.Object)
## S4 method for signature 'textstat'
knit_print(x, options = knitr::opts_chunk, ...)
## S4 method for signature 'textstat'
get_corpus(x)
## S4 method for signature 'textstat'
format(x, digits = 2L)
restore(file)
cp(x)
## S4 method for signature 'textstat'
view(.Object)
Arguments
x |
An object ( |
value |
A |
digits |
Number of digits. |
by |
Column that will serve as the key for sorting. |
decreasing |
Logical, whether to return decreasing order. |
object |
a textstat object |
... |
Argument that will be passed into a call of the |
e1 |
A |
e2 |
Another |
subset |
A logical expression indicating elements or rows to keep. |
.Object |
A |
options |
Chunk options. |
file |
An rds file to restore (filename). |
Details
A head
-method will return the first rows of the data.table
in
the stat
-slot. Use argument n
to specify the number of rows.
A tail
-method will return the last rows of the data.table
in
the stat
-slot. Use argument n
to specify the number of rows.
The methods dim
, nrow
and ncol
will return information
on the dimensions, the number of rows, or the number of columns of the
data.table
in the stat
-slot, respectively.
Objects derived from the textstat
class can be indexed with simple
square brackets ("[") to get rows specified by an numeric/integer vector,
and with double square brackets ("[[") to get specific columns from the
data.table
in the slot stat
.
The colnames
-method will return the column names of the data-table
in the slot stat
.
The methods as.data.table
, and as.data.frame
will extract the
data.table
in the slot stat
as a data.table
, or data.frame
,
respectively.
textstat
objects can have a name, which can be retrieved, and set using
the name
-method and name<-
, respectively.
The round()
-method looks up all numeric columns in the
data.table
in the stat
-slot of the textstat
object and
rounds values of these columns to the number of decimal places specified by
argument digits
.
The knit_print
method will be called by knitr to render
textstat
objects or objects inheriting from the textstat
class as a
DataTable htmlwidget
when rendering a R Markdown document as html.
It will usually be necessary to explicitly state "render = knit_print" in
the chunk options. The option polmineR.pagelength
controls the number of
lines displayed in the resulting htmlwidget
. Note that including
htmlwidgets in html documents requires that pandoc is installed. To avoid
an error, a formatted data.table
is returned by knit_print
if
pandoc is not available.
The format()
-method returns a pretty-printed and minimized version
of the data.table
in the stat
-slot of the textstat
-object: It will
round all numeric columns to the number of decimal numbers specified by
digits
, and drop all columns with token ids. The return value is a
data.table
.
Using the reference semantics of data.table
objects (i.e. inplace
modification) has great advantages for memory efficiency. But there may be
unexpected behavior when reloading an S4 textstat
object (including classes
inheriting from textstat
) with a data.table
in the stat
slot. Use
restore
to copy the data.table
once to have a restored object that works
for inplace operations after saving / reloading it.
It is not possible to add columns to the data.table
in the stat
slot of a textclass
object, when the object has been saved and loaded
using save()
/load()
. This scenario applies for instance, when the
objects of an interactive R session are saved, and loaded when starting the
next interactive R session. The cp()
function will create a copy of the
object, including an explicit copy of the data.table
in the stat
slot.
Inplace modifications of the new object are possible. The function can also
be used to avoid unwanted side effects of modifying an object.
Slots
p_attribute
Object of class
character
, p-attribute of the query.corpus
A corpus specified by a length-one
character
vector.stat
A
data.table
with statistical information.name
The name of the object.
annotation_cols
A
character
vector, column names ofdata.table
in slotstat
that are annotations.encoding
A length-one
character
vector, the encoding of the corpus.
Examples
use(pkg = "polmineR", corpus = "GERMAPARLMINI")
use(pkg = "RcppCWB", corpus = "REUTERS")
P <- partition("GERMAPARLMINI", date = ".*", p_attribute = "word", regex = TRUE)
y <- cooccurrences(P, query = "Arbeit")
# generics defined in the polmineR package
x <- count("REUTERS", p_attribute = "word")
name(x) <- "count_reuters"
name(x)
get_corpus(x)
# Standard generic methods known from data.frames work for objects inheriting
# from the textstat class
head(y)
tail(y)
nrow(y)
ncol(y)
dim(y)
colnames(y)
# Use brackets for indexing
## Not run:
y[1:25]
y[,c("word", "ll")]
y[1:25, "word"]
y[1:25][["word"]]
y[which(y[["word"]] %in% c("Arbeit", "Sozial"))]
y[ y[["word"]] %in% c("Arbeit", "Sozial") ]
## End(Not run)
sc <- partition("GERMAPARLMINI", speaker = "Angela Dorothea Merkel")
cnt <- count(sc, p_attribute = c("word", "pos"))
cnt_min <- subset(cnt, pos %in% c("NN", "ADJA"))
cnt_min <- subset(cnt, pos == "NE")
use(pkg = "RcppCWB", corpus = "REUTERS")
# Get statistics in textstat object as data.table
count_dt <- corpus("REUTERS") %>%
subset(grep("saudi-arabia", places)) %>%
count(p_attribute = "word") %>%
as.data.table()
# textstat objects stored as *.rds files should be loaded using restore().
# Before moving to examples, this is a brief technical dip why this is
# recommended: If we load the *.rds file with readRDS(), the data.table in
# the slot 'stat' will have the pointer '0x0', and the data.table cannot be
# augmented without having been copied previously.
k <- kwic("REUTERS", query = "oil")
kwicfile <- tempfile(fileext = ".rds")
saveRDS(k, file = kwicfile)
problemprone <- readRDS(file = kwicfile)
problemprone@stat[, "newcol" := TRUE]
"newcol" %in% colnames(problemprone@stat) # is FALSE!
attr(problemprone@stat, ".internal.selfref")
identical(attr(problemprone@stat, ".internal.selfref"), new("externalptr"))
# Restore stored S4 object with copy of data.table in 'stat' slot
k <- kwic("REUTERS", query = "oil")
kwicfile <- tempfile(fileext = ".rds")
saveRDS(k, file = kwicfile)
k2 <- restore(kwicfile)
enrich(k2, s_attribute = "id")
"id" %in% colnames(k2) # is TRUE
k <- kwic("REUTERS", query = "oil")
rdata_file <- tempfile(fileext = ".RData")
save(k, file = rdata_file)
rm(k)
load(rdata_file)
k <- cp(k) # now it is possible to columns by reference
enrich(k, s_attribute = "id")
"id" %in% colnames(k)