browse_texts {corpustools}R Documentation

Create and view a full text browser

Description

Creates a static HTML file to view the texts in the tcorpus in full text mode.

Usage

browse_texts(
  tc,
  doc_ids = NULL,
  token_col = "token",
  n = 500,
  select = c("first", "random"),
  header = "",
  subheader = NULL,
  highlight = NULL,
  scale = NULL,
  category = NULL,
  rsyntax = NULL,
  value = NULL,
  meta_cols = NULL,
  seed = NA,
  nav = NULL,
  top_nav = NULL,
  thres_nav = 1,
  view = T,
  highlight_col = "yellow",
  scale_col = c("red", "blue", "green"),
  filename = NULL
)

Arguments

tc

a tCorpus

doc_ids

A vector with document ids to view

token_col

The name of the column in tc$tokens that contain the token text

n

Only n of the results are printed (to prevent accidentally making huge browsers).

select

If n is smaller than the number of documents in tc, select determines how the n documents are selected

header

Optionally, a title presented at the top of the browser

subheader

Optionally, overwrite the subheader. By default the subheader reports the number of documents

highlight

Highlighe mode: provide the name of a numeric column in tc$tokens with values between 0 and 1, used to highlight tokens. Can also be a character vector, in which case al non-NA values are highlighted

scale

Scale mode: provide the name of a numeric column in tc$tokens with values between -1 and 1, used to color tokens on a scale (set colors with scale_col)

category

Category mode: provide the name of a character or factor column in tc$tokens. Each unique value will have its own color, and navigation for categories will be added (nav cannot be used with this option)

rsyntax

rsyntax mode: provide the name of an rsyntax annotation column (see annotate_rsyntax)

value

rsyntax mode argument: if rsyntax mode is used, value can be a character vector with values in the rsyntax annotation column. If used, only these values are fully colored, and the other (non NA) values only have border colors.

meta_cols

A character vector with names of columns in tc$meta, used to only show the selected columns

seed

If select is "random", seed can be used to set a random seed. After sampling the seed is re-initialized with set.seed(NULL).

nav

Optionally, a column in tc$meta to add navigation (only supports simple filtering on unique values). This is not possible if category is used.

top_nav

A number. If navigation based on token annotations is used, filters will only apply to top x values with highest token occurence in a document

thres_nav

Like top_nav, but specifying a threshold for the minimum number of tokens.

view

If TRUE (default), view the browser in the Viewer window (turn off if this is not supported)

highlight_col

If highlight is used, the color for highlighting

scale_col

If scale is used, a vector with 2 or more colors used to create a color ramp. That is, -1 is first color, +1 is last color, if three colors are given 0 matches the middle color, and colors in between are interpolated.

filename

Optionally, save the browser at a specified location

Value

The url for the file location is returned (invisibly)

Examples


tc = create_tcorpus(sotu_texts, doc_column='id')

queries = c('War# war soldier* weapon*',
            'Economy# econom* market* tax*')
tc$code_features(queries)

browse_texts(tc, category='code')


[Package corpustools version 0.4.10 Index]