R: Compare shared terms associated with a miRNA name

compare_mir_terms_scatter {miRetrieve}

R Documentation

Compare shared terms associated with a miRNA name

Description

Compare shared terms associated with a miRNA name over two topics.

Usage

compare_mir_terms_scatter(
  df,
  mir,
  top = 1000,
  token = "words",
  ...,
  topic = NULL,
  stopwords = stopwords_miretrieve,
  stopwords_ngram = TRUE,
  html = TRUE,
  colour.point = "red",
  colour.term = "black",
  col.mir = miRNA,
  col.abstract = Abstract,
  col.topic = Topic,
  col.pmid = PMID,
  title = NULL
)

Arguments

`df`	Data frame containing miRNA names, abstracts, topics, and PubMed-IDs.
`mir`	String. miRNA name of interest.
`top`	Integer. Number of top terms to plot.
`token`	String. Specifies how abstracts shall be split up. Taken from `unnest_tokens()` in the tidytext package: "Unit for tokenizing, or a custom tokenizing function. Built-in options are "words" (default), "characters", "character_shingles", "ngrams", "skip_ngrams", "sentences", "lines", "paragraphs", "regex", (...), and "ptb" (Penn Treebank). If a function, should take a character vector and return a list of character vectors of the same length."
`...`	Additional arguments for tokenization, if necessary.
`topic`	Character vector. Optional. Specifies which topics to plot. Must have length two. If `topic = NULL`, all topics in `df` are plotted.
`stopwords`	Data frame containing stop words.
`stopwords_ngram`	Boolean. Specifies if stop words shall be removed from abstracts when using ngrams. Only applied when `token = 'ngrams'`.
`html`	Boolean. Specifies if plot is returned as an HTML-widget or static.
`colour.point`	String. Colour of points for scatter plot.
`colour.term`	String. Colour of terms for scatter plot.
`col.mir`	Symbol. Column containing miRNAs.
`col.abstract`	Symbol. Column containing abstracts.
`col.topic`	Symbol. Column containing topics names.
`col.pmid`	Symbol. Column containing PubMed-IDs.
`title`	String. Plot title.

Details

Compare shared terms associated with a miRNA name over two topics. These terms are displayed as a scatter plot, which is either interactive as an HTML-widget, or static. This is regulated via the html argument. miRNA names and topics must be in a data frame df, while terms are taken from abstracts contained in df. Number of top terms to choose is regulated by top. Terms are evaluated as their raw count and plotted on a log10-scale. compare_mir_terms_scatter() is based on the tools available in the tidytext package. The term-plot is greatly inspired by “tidytext: Text Mining and Analysis Using Tidy Data Principles in R.” by Silge and Robinson.

Value

Scatter plot comparing shared terms of a miRNA between two topics.

References

Silge, Julia, and David Robinson. 2016. “tidytext: Text Mining and Analysis Using Tidy Data Principles in R.” JOSS 1 (3). The Open Journal. https://doi.org/10.21105/joss.00037.