fst_get_top_words {finnsurveytext}R Documentation

Make Top Words Table

Description

Creates a table of the most frequently-occurring words (unigrams) within the data.

Usage

fst_get_top_words(
  data,
  number = 10,
  norm = "number_words",
  pos_filter = NULL,
  strict = TRUE
)

Arguments

data

A dataframe of text in CoNLL-U format.

number

The number of top words to return, default is '10'.

norm

The method for normalising the data. Valid settings are '"number_words"' (the number of words in the responses, default), '"number_resp"' (the number of responses), or 'NULL' (raw count returned).

pos_filter

List of UPOS tags for inclusion, default is 'NULL' which means all word types included.

strict

Whether to strictly cut-off at 'number' (ties are alphabetically ordered), default is 'TRUE'.

Value

A table of the most frequently occurring words in the data.

Examples

fst_get_top_words(conllu_dev_q11_1_nltk, number = 15, strict = FALSE)
cb <- conllu_cb_bullying
pf <- c("NOUN", "VERB", "ADJ", "ADV")
fst_get_top_words(cb, number = 5, norm = "number_resp", pos_filter = pf)

[Package finnsurveytext version 1.0.0 Index]