R: Plotting Counts of Topics-Words-Combination over Time...

plotTopicWord {tosca}

R Documentation

Plotting Counts of Topics-Words-Combination over Time (Relative to Words)

Description

Creates a plot of the counts/proportion of specified combination of topics and words. It is important to keep in mind that the baseline for proportions are the sums of words, not sums of topics. See also plotWordpt. There is an option to plot all curves in one plot or to create one plot for every curve (see pages). In addition the plots can be written to a pdf by setting file.

Usage

plotTopicWord(
  object,
  docs,
  ldaresult,
  ldaID,
  wordlist = lda::top.topic.words(ldaresult$topics, 1),
  link = c("and", "or"),
  select = 1:nrow(ldaresult$document_sums),
  tnames,
  wnames,
  rel = FALSE,
  mark = TRUE,
  unit = "month",
  curves = c("exact", "smooth", "both"),
  smooth = 0.05,
  legend = ifelse(pages, "onlyLast:topright", "topright"),
  pages = FALSE,
  natozero = TRUE,
  file,
  main,
  xlab,
  ylab,
  ylim,
  both.lwd,
  both.lty,
  col,
  ...
)

Arguments

`object`	`textmeta` object with strictly tokenized `text` component (Character vectors) - such as a result of `cleanTexts`
`docs`	Object as a result of `LDAprep` which was handed over to `LDAgen`
`ldaresult`	The result of a function call `LDAgen` with `docs` as argument
`ldaID`	Character vector of IDs of the documents in `ldaresult`
`wordlist`	List of Ccharacter vectors. Every list element is an 'or' link, every character string in a vector is linked by the argument `link`. If `wordlist` is only a character vector it will be coerced to a list of the same length as the vector (see `as.list`), so that the argument `link` has no effect. Each character vector as a list element represents one curve in the emerging plot.
`link`	Character: Should the (inner) character vectors of each list element be linked by an `"and"` or an `"or"` (default: `"and"`)?
`select`	List of integer vectors: Which topics - linked by an "or" every time - should be take into account for plotting the word counts/proportion (default: all topics as simple integer vector)?
`tnames`	Character vector of same length as `select` - labels for the topics (default are the first returned words of
`wnames`	Character vector of same length as `wordlist` - labels for every group of 'and' linked words `top.topic.words` from the `lda` package for each topic)
`rel`	Logical: Should counts (`FALSE`) or proportion (`TRUE`) be plotted (default: `FALSE`)?
`mark`	Logical: Should years be marked by vertical lines (default: `TRUE`)?
`unit`	Character: To which unit should dates be floored (default: `"month"`)? Other possible units are `"bimonth"`, `"quarter"`, `"season"`, `"halfyear"`, `"year"`, for more units see `round_date`
`curves`	Character: Should `"exact"`, `"smooth"` curve or `"both"` be plotted (default: `"exact"`)?
`smooth`	Numeric: Smoothing parameter which is handed over to `lowess` as `f` (default: `0.05`)
`legend`	Character: Value(s) to specify the legend coordinates (default: `"topright"`, `"onlyLast:topright"` for `pages = TRUE` respectively). If "none" no legend is plotted.
`pages`	Logical: Should all curves be plotted in a single plot (default: `FALSE`)? In addition you could set `legend = "onlyLast:<argument>"` with `<argument>` as a character `legend` argument for only plotting a legend on the last plot of set.
`natozero`	Logical: Should NAs be coerced to zeros (default: `TRUE`)?
`file`	Character: File path if a pdf should be created
`main`	Character: Graphical parameter
`xlab`	Character: Graphical parameter
`ylab`	Character: Graphical parameter
`ylim`	Graphical parameter
`both.lwd`	Graphical parameter for smoothed values if `curves = "both"`
`both.lty`	Graphical parameter for smoothed values if `curves = "both"`
`col`	Graphical parameter, could be a vector. If `curves = "both"` the function will for every wordgroup plot at first the exact and then the smoothed curve - this is important for your col order.
`...`	Additional graphical parameters

Value

A plot. Invisible: A dataframe with columns date and tnames: wnames with the counts/proportion of the selected combination of topics and words.

Examples

## Not run: 
data(politics)
poliClean <- cleanTexts(politics)
words10 <- makeWordlist(text=poliClean$text)
words10 <- words10$words[words10$wordtable > 10]
poliLDA <- LDAprep(text=poliClean$text, vocab=words10)
LDAresult <- LDAgen(documents=poliLDA, K=10, vocab=words10)

# plot topwords from each topic
plotTopicWord(object=poliClean, docs=poliLDA, ldaresult=LDAresult, ldaID=names(poliLDA))
plotTopicWord(object=poliClean, docs=poliLDA, ldaresult=LDAresult, ldaID=names(poliLDA), rel=TRUE)

# plot one word in different topics
plotTopicWord(object=poliClean, docs=poliLDA, ldaresult=LDAresult, ldaID=names(poliLDA),
              select=c(1,3,8), wordlist=c("bush"))

# Differences between plotTopicWord and plotWordpt
par(mfrow=c(2,2))
plotTopicWord(object=poliClean, docs=poliLDA, ldaresult=LDAresult, ldaID=names(poliLDA),
              select=c(1,3,8), wordlist=c("bush"), rel=FALSE)
plotWordpt(object=poliClean, docs=poliLDA, ldaresult=LDAresult, ldaID=names(poliLDA),
           select=c(1,3,8), wordlist=c("bush"), rel=FALSE)
plotTopicWord(object=poliClean, docs=poliLDA, ldaresult=LDAresult, ldaID=names(poliLDA),
              select=c(1,3,8), wordlist=c("bush"), rel=TRUE)
plotWordpt(object=poliClean, docs=poliLDA, ldaresult=LDAresult, ldaID=names(poliLDA),
           select=c(1,3,8), wordlist=c("bush"), rel=TRUE)

## End(Not run)

[Package tosca version 0.3-2 Index]