fst_ngrams_compare {finnsurveytext}R Documentation

Compare and plot top n-grams

Description

Find top and unique top n-grams for between 2 and 4 sets of prepared data. Results will be shown within the plots pane. If 2 or 3 plots, they will be in a single row, if there are 4 plots, they will be in 2 rows of 2.

Usage

fst_ngrams_compare(
  data1,
  data2,
  data3 = NULL,
  data4 = NULL,
  number = 10,
  ngrams = 1,
  norm = "number_words",
  pos_filter = NULL,
  name1 = "Group 1",
  name2 = "Group 2",
  name3 = "Group 3",
  name4 = "Group 4",
  unique_colour = "indianred",
  strict = TRUE
)

Arguments

data1

A dataframe of text in CoNLL-U format for the first plot.

data2

A dataframe of text in CoNLL-U format for the second plot.

data3

An optional dataframe of text in CoNLL-U format for the third plot, default is 'NULL'.

data4

An optional dataframe of text in CoNLL-U format for the fourth plot, default is 'NULL'.

number

The number of n-grams to return, default is '10'.

ngrams

The type of n-grams to return, default is '1'.

norm

The method for normalising the data. Valid settings are '"number_words"' (the number of words in the responses, default), '"number_resp"' (the number of responses), or 'NULL' (raw count returned).

pos_filter

List of UPOS tags for inclusion, default is 'NULL' which means all word types included.

name1

An optional "name" for the first plot, default is '"Group 1"'.

name2

An optional "name" for the second plot, default is '"Group 2"'.

name3

An optional "name" for the third plot, default is '"Group 3"'.

name4

An optional "name" for the fourth plot, default is '"Group 4"'.

unique_colour

Colour to display unique words, default is '"indianred"'.

strict

Whether to strictly cut-off at 'number' (ties are alphabetically ordered), default is 'TRUE'.

Value

Between 2 and 4 plots of Top n-grams in the plots pane with unique n-grams highlighted.

Examples

f <- conllu_dev_q11_1_f_nltk
m <- conllu_dev_q11_1_m_nltk
na <- conllu_dev_q11_1_na_nltk
all <- conllu_dev_q11_1_nltk
fst_ngrams_compare(f, m, na, all, number = 10, strict = FALSE)
fst_ngrams_compare(f, m, ngrams = 2, number = 10, norm = "number_resp")
fst_ngrams_compare(f, m, ngrams = 2, number = 10, strict = FALSE)
fst_ngrams_compare(f, m, number = 5, ngrams = 3, name1 = "M", name2 = "F")
fst_ngrams_compare(f, m, na, number = 20, unique_colour = "slateblue", )

[Package finnsurveytext version 1.0.0 Index]