R: Function for creating a first draft of a vocabulary This...

bow_pp_create_vocab_draft {aifeducation}

R Documentation

Function for creating a first draft of a vocabulary This function creates a list of tokens which refer to specific universal part-of-speech tags (UPOS) and provides the corresponding lemmas.

Description

Function for creating a first draft of a vocabulary This function creates a list of tokens which refer to specific universal part-of-speech tags (UPOS) and provides the corresponding lemmas.

Usage

bow_pp_create_vocab_draft(
  path_language_model,
  data,
  upos = c("NOUN", "ADJ", "VERB"),
  label_language_model = NULL,
  language = NULL,
  chunk_size = 100,
  trace = TRUE
)

Arguments

`path_language_model`	`string` Path to a udpipe language model that should be used for tagging and lemmatization.
`data`	`vector` containing the raw texts.
`upos`	`vector` containing the universal part-of-speech tags which should be used to build the vocabulary.
`label_language_model`	`string` Label for the udpipe language model used.
`language`	`string` Name of the language (e.g., English, German)
`chunk_size`	`int` Number of raw texts which should be processed at once.
`trace`	`bool` `TRUE` if information about the progress should be printed to console.

Value

list with the following components.

vocab: data.frame containing the tokens, lemmas, tokens in lower case, and lemmas in lower case.
ud_language_model udpipe language model that is used for tagging.
label_language_model Label of the udpipe language model.
language Language of the raw texts.
upos Used univerisal part-of-speech tags.
n_sentence int Estimated number of sentences in the raw texts.
n_token int Estimated number of tokens in the raw texts.
n_document_segments int Estimated number of document segments/raw texts.

Note

A list of possible tags can be found here: https://universaldependencies.org/u/pos/index.html.

A huge number of models can be found here: https://ufal.mff.cuni.cz/udpipe/2/models.

Function for creating a first draft of a vocabulary This function creates a list of tokens which refer to specific universal part-of-speech tags (UPOS) and provides the corresponding lemmas.

Description

Usage

Arguments

Value

Note

See Also