R: Plot n-gram frequencies

ggram {ngramr}

R Documentation

Plot n-gram frequencies

Description

ggram downloads data from the Google Ngram Viewer website and plots it in ggplot2 style.

Usage

ggram(
  phrases,
  ignore_case = FALSE,
  code_corpus = FALSE,
  geom = "line",
  geom_options = list(),
  lab = NA,
  google_theme = FALSE,
  ...
)

Arguments

`phrases`	vector of phrases. Alternatively, phrases can be an ngram object returned by `ngram` or `ngrami`.
`ignore_case`	logical, indicating whether the frequencies are case insensitive. Default is `FALSE`.
`code_corpus`	logical, indicating whether to use abbreviated corpus 'codes or longer form descriptions. Default is `FALSE`.
`geom`	the ggplot2 geom used to plot the data; defaults to "line"
`geom_options`	list of additional parameters passed to the ggplot2 geom.
`lab`	y-axis label. Defaults to "Frequency".
`google_theme`	use a Google Ngram-style plot theme.
`...`	additional parameters passed to `ngram`

Details

Google generated two datasets drawn from digitised books in the Google books collection. One was generated in July 2009, the second in July 2012. Google will update these datasets as book scanning continues.

Examples

library(ggplot2)
ggram(c("hacker", "programmer"), year_start = 1950)

# Changing the geom.
ggram(c("cancer", "fumer", "cigarette"),
      year_start = 1900,
      corpus = "fr-2012",
      smoothing = 0,
      geom = "step")

# Passing more options.
ggram(c("cancer", "smoking", "tobacco"),
      year_start = 1900,
      corpus = "en-fiction-2012",
      geom = "point",
      smoothing = 0,
      geom_options = list(alpha = .5)) +
  stat_smooth(method="loess", se = FALSE, formula = y  ~ x)

# Setting the layers manually.
ggram(c("cancer", "smoking", "tobacco"),
      year_start = 1900,
      corpus = "en-fiction-2012",
      smoothing = 0,
      geom = NULL) +
  stat_smooth(method="loess", se=FALSE, span = 0.3, formula = y ~ x)

# Setting the legend placement on a long query and using the Google theme.
# Example taken from a post by Ben Zimmer at Language Log.
p <- c("((The United States is + The United States has) / The United States)",
      "((The United States are + The United States have) / The United States)")
ggram(p, year_start = 1800, google_theme = TRUE) +
      theme(legend.direction="vertical")

# Pass ngram data rather than phrases
ggram(hacker) + facet_wrap(~ Corpus)

[Package ngramr version 1.9.3 Index]