fst_ngrams_compare {finnsurveytext} | R Documentation |
Compare and plot top n-grams
Description
Find top and unique top n-grams for between 2 and 4 sets of prepared data. Results will be shown within the plots pane. If 2 or 3 plots, they will be in a single row, if there are 4 plots, they will be in 2 rows of 2.
Usage
fst_ngrams_compare(
data1,
data2,
data3 = NULL,
data4 = NULL,
number = 10,
ngrams = 1,
norm = "number_words",
pos_filter = NULL,
name1 = "Group 1",
name2 = "Group 2",
name3 = "Group 3",
name4 = "Group 4",
unique_colour = "indianred",
strict = TRUE
)
Arguments
data1 |
A dataframe of text in CoNLL-U format for the first plot. |
data2 |
A dataframe of text in CoNLL-U format for the second plot. |
data3 |
An optional dataframe of text in CoNLL-U format for the third plot, default is 'NULL'. |
data4 |
An optional dataframe of text in CoNLL-U format for the fourth plot, default is 'NULL'. |
number |
The number of n-grams to return, default is '10'. |
ngrams |
The type of n-grams to return, default is '1'. |
norm |
The method for normalising the data. Valid settings are '"number_words"' (the number of words in the responses, default), '"number_resp"' (the number of responses), or 'NULL' (raw count returned). |
pos_filter |
List of UPOS tags for inclusion, default is 'NULL' which means all word types included. |
name1 |
An optional "name" for the first plot, default is '"Group 1"'. |
name2 |
An optional "name" for the second plot, default is '"Group 2"'. |
name3 |
An optional "name" for the third plot, default is '"Group 3"'. |
name4 |
An optional "name" for the fourth plot, default is '"Group 4"'. |
unique_colour |
Colour to display unique words, default is '"indianred"'. |
strict |
Whether to strictly cut-off at 'number' (ties are alphabetically ordered), default is 'TRUE'. |
Value
Between 2 and 4 plots of Top n-grams in the plots pane with unique n-grams highlighted.
Examples
f <- conllu_dev_q11_1_f_nltk
m <- conllu_dev_q11_1_m_nltk
na <- conllu_dev_q11_1_na_nltk
all <- conllu_dev_q11_1_nltk
fst_ngrams_compare(f, m, na, all, number = 10, strict = FALSE)
fst_ngrams_compare(f, m, ngrams = 2, number = 10, norm = "number_resp")
fst_ngrams_compare(f, m, ngrams = 2, number = 10, strict = FALSE)
fst_ngrams_compare(f, m, number = 5, ngrams = 3, name1 = "M", name2 = "F")
fst_ngrams_compare(f, m, na, number = 20, unique_colour = "slateblue", )