plotFreq {tosca}R Documentation

Plotting Counts of specified Wordgroups over Time (relative to Corpus)

Description

Creates a plot of the counts/proportion of given wordgroups (wordlist) in the subcorpus. The counts/proportion can be calculated on document or word level - with an 'and' or 'or' link - and additionally can be normalised by a subcorporus, which could be specified by id.

Usage

plotFreq(
  object,
  id = names(object$text),
  type = c("docs", "words"),
  wordlist,
  link = c("and", "or"),
  wnames,
  ignore.case = FALSE,
  rel = FALSE,
  mark = TRUE,
  unit = "month",
  curves = c("exact", "smooth", "both"),
  smooth = 0.05,
  both.lwd,
  both.lty,
  main,
  xlab,
  ylab,
  ylim,
  col,
  legend = "topright",
  natozero = TRUE,
  file,
  ...
)

Arguments

object

textmeta object with strictly tokenized text component (character vectors) - like a result of cleanTexts

id

character vector (default: object$meta$id) which IDs specify the subcorpus

type

character (default: "docs") should counts/proportion of documents, where every "docs" or words "words" be plotted

wordlist

list of character vectors. Every list element is an 'or' link, every character string in a vector is linked by the argument link. If wordlist is only a character vector it will be coerced to a list of the same length as the vector (see as.list), so that the argument link has no effect. Each character vector as a list element represents one curve in the outcoming plot

link

character (default: "and") should the (inner) character vectors of each list element be linked by an "and" or an "or"

wnames

character vector of same length as wordlist - labels for every group of 'and' linked words

ignore.case

logical (default: FALSE) option from grepl.

rel

logical (default: FALSE) should counts (FALSE) or proportion (TRUE) be plotted

mark

logical (default: TRUE) should years be marked by vertical lines

unit

character (default: "month") to which unit should dates be floored. Other possible units are "bimonth", "quarter", "season", "halfyear", "year", for more units see round_date

curves

character (default: "exact") should "exact", "smooth" curve or "both" be plotted

smooth

numeric (default: 0.05) smoothing parameter which is handed over to lowess as f

both.lwd

graphical parameter for smoothed values if curves = "both"

both.lty

graphical parameter for smoothed values if curves = "both"

main

character graphical parameter

xlab

character graphical parameter

ylab

character graphical parameter

ylim

(default if rel = TRUE: c(0, 1)) graphical parameter

col

graphical parameter, could be a vector. If curves = "both" the function will for every wordgroup plot at first the exact and then the smoothed curve - this is important for your col order.

legend

character (default: "topright") value(s) to specify the legend coordinates. If "none" no legend is plotted.

natozero

logical (default: TRUE) should NAs be coerced to zeros. Only has effect if rel = TRUE.

file

character file path if a pdf should be created

...

additional graphical parameters

Value

A plot. Invisible: A dataframe with columns date and wnames - and additionally columns wnames_rel for rel = TRUE - with the counts (and proportion) of the given wordgroups.

Examples

## Not run: 
data(politics)
poliClean <- cleanTexts(politics)
plotFreq(poliClean, wordlist=c("obama", "bush"))

## End(Not run)

[Package tosca version 0.3-2 Index]