subcorpus {polmineR}R Documentation

The S4 subcorpus class.

Description

Class to manage subcorpora derived from a CWB corpus.

Usage

## S4 method for signature 'subcorpus'
summary(object)

## S4 replacement method for signature 'subcorpus'
name(x) <- value

## S4 method for signature 'subcorpus'
get_corpus(x)

## S4 method for signature 'subcorpus'
size(x, s_attribute = NULL, ...)

Arguments

object

A subcorpus object.

x

A subcorpus object.

value

A character vector to assign as name to slot name of a subcorpus class object.

s_attribute

A character vector with s-attributes (one or more).

...

Arguments passed into size-method. Used only to maintain backwards compatibility.

Methods (by generic)

Slots

s_attributes

A named list with the structural attributes defining the subcorpus.

cpos

A matrix with left and right corpus positions defining regions (two column matrix with integer values).

annotations

Object of class list.

size

Total size (number of tokens) of the subcorpus object (a length-one integer vector). The value is accessible by calling the size-method on the subcorpus-object (see examples).

metadata

Object of class data.frame, metadata information.

strucs

Object of class integer, the strucs defining the subcorpus.

xml

Object of class character, whether the xml is "flat" or "nested".

s_attribute_strucs

Object of class character, the base node.

user

If the corpus on the server requires authentication, the username.

password

If the corpus on the server requires authentication, the password.

See Also

Most commonly, a subcorpus is derived from a corpus or a subcorpus using the subset method. See size for detailed documentation on how to use the size-method. The subcorpus class shares many features with the partition class, but it is more parsimonious and does not include information on statistical properties of the subcorpus (i.e. a count table). In line with this logic, the subcorpus class inherits from the corpus class, whereas the partition class inherits from the textstat class.

Other classes to manage corpora: corpus-class, phrases-class, ranges-class, regions

Examples

use("polmineR")

# basic example 
r <- corpus("REUTERS")
k <- subset(r, grepl("kuwait", places))
name(k) <- "kuwait"
y <- summary(k)
s <- size(k)

# the same with a magrittr pipe
corpus("REUTERS") %>%
  subset(grepl("kuwait", places)) %>%
  summary()
  
# subsetting a subcorpus in a pipe
stone <- corpus("GERMAPARLMINI") %>%
  subset(date == "2009-11-10") %>%
  subset(speaker == "Frank-Walter Steinmeier")

# perform count for subcorpus
n <- corpus("REUTERS") %>% subset(grep("kuwait", places)) %>% count(p_attribute = "word")
n <- corpus("REUTERS") %>% subset(grep("saudi-arabia", places)) %>% count('"Saudi" "Arabia"')
  
# keyword-in-context analysis (kwic)   
k <- corpus("REUTERS") %>% subset(grep("kuwait", places)) %>% kwic("oil")


[Package polmineR version 0.8.9 Index]