generate_stoplist {tidystopwords}R Documentation

Listing of stop words in different languages.

Description

Generate a vector of stop words in one or several languages.

Usage

generate_stoplist(language = NULL, output_form = 1)

Arguments

language

single string or a character vector. NULL by default. The strings can be language names or ISO-639 language codes as listed by the list_supported_languages(), freely combined, case-sensitive. When no language is recognized, the following error message appears: "The language name or language id you have selected is not supported. (Or you didn't specify a language at all). Check out the supported languages by calling 'list_supported_languages'.".

output_form

default 1, alternatively 2 or 3. Option 1 returns a character vector of unique stopwords word forms. Option 2 returns a named vector whose elements are the stopwords word forms and names are the associated stop classes. One word form can occur with different stop classes; hence the word forms in this vector are not unique, unlike Option 1. Option 3 returns a data frame filtered according to the language selection.

Value

The function comes with three output options.

All outputs are encoded in UTF-8.

Warning

Author(s)

Silvie Cinková, Maciej Eder

References

The underlying data frame 'multilingual_stoplist' is based on the official release of Version 2.8 of Universal Dependencies.

https://universaldependencies.org

Zeman, Daniel; et al., 2021, Universal Dependencies 2.8.1, LINDAT/CLARIAH-CZ digital library at the Institute of Formal and Applied Linguistics (ÚFAL), Faculty of Mathematics and Physics, Charles University, http://hdl.handle.net/11234/1-3687.

See Also

list_supported_languages, multilingual_stoplist

Examples

generate_stoplist(language = "English", output_form = 1) 

generate_stoplist(language = "English", output_form = 2) 
  
generate_stoplist(language = "English", output_form = 3) 


[Package tidystopwords version 0.9.1 Index]