| select.dict {lingmatch} | R Documentation |
Select Dictionaries
Description
Retrieve information and links to dictionaries (lexicons/word lists) available at osf.io/y6g5b.
Usage
select.dict(query = NULL, dir = getOption("lingmatch.dict.dir"),
check.md5 = TRUE, mode = "wb")
Arguments
query |
A character matching a dictionary name, or a set of keywords to search for in dictionary information. |
dir |
Path to a folder containing dictionaries, or where you want them to be saved. Will look in getOption('lingmatch.dict.dir') and '~/Dictionaries' by default. |
check.md5 |
Logical; if |
mode |
Passed to |
Value
A list with varying entries:
-
info: The version of osf.io/kjqb8 stored internally; adata.framewith dictionary names as row names, and information about each dictionary in columns.
Also described at osf.io/y6g5b/wiki/dict_variables, hereshort(corresponding to the file name [{short}.(csv|dic)] and wiki urls [https://osf.io/y6g5b/wiki/{short}]) is set as row names and removed:-
name: Full name of the dictionary. -
description: Description of the dictionary, relating to its purpose and development. -
note: Notes about processing decisions that additionally alter the original. -
constructor: How the dictionary was constructed:-
algorithm: Terms were selected by some automated process, potentially learned from data or other resources. -
crowd: Several individuals rated the terms, and in aggregate those ratings translate to categories and weights. -
mixed: Some combination of the other methods, usually in some iterative process. -
team: One of more individuals make decisions about term inclusions, categories, and weights.
-
-
subject: Broad, rough subject or purpose of the dictionary:-
emotion: Terms relate to emotions, potentially exemplifying or expressing them. -
general: A large range of categories, aiming to capture the content of the text. -
impression: Terms are categorized and weighted based on the impression they might give. -
language: Terms are categorized or weighted based on their linguistic features, such as part of speech, specificity, or area of use. -
social: Terms relate to social phenomena, such as characteristics or concerns of social entities.
-
-
terms: Number of unique terms across categories. -
term_type: Format of the terms:-
glob: Include asterisks which denote inclusion of any characters until a word boundary. -
glob+: Glob-style asterisks with regular expressions within terms. -
ngram: Includes any number of words as a term, separated by spaces. -
pattern: A string of characters, potentially within or between words, or spanning words. -
regex: Regular expressions. -
stem: Unigrams with common endings removed. -
unigram: Complete single words.
-
-
weighted: Indicates whether weights are associated with terms. This determines the file type of the dictionary: dictionaries with weights are stored as .csv, and those without are stored as .dic files. -
regex_characters: Logical indicating whether special regular expression characters are present in any term, which might need to be escaped if the terms are used in regular expressions. Glob-type terms allow complete parens (at least one open and one closed, indicating preceding or following words), and initial and terminal asterisks. For all other terms,[](){}*.^$+?\|are counted as regex characters. These could be escaped in R withgsub('([][)(}{*.^$+?\\|])', '\\\1', terms)iftermsis a character vector, and in Python with (importing re)[re.sub(r'([][(){}*.^$+?\|])', r'\\1', term)for term in terms]iftermsis a list. -
categories: Category names in the order in which they appear in the dictionary file, separated by commas. -
ncategories: Number of categories. -
original_max: Maximum value of the original dictionary before standardization:original values / max(original values) * 100. Dictionaries with no weights are considered to have a max of1. -
osf: ID of the file on OSF, translating to the file's URL: https://osf.io/osf. -
wiki: URL of the dictionary's wiki. -
downloaded: Path to the file if downloaded, and''otherwise.
-
-
selected: A subset ofinfoselected byquery.
See Also
Other Dictionary functions:
dictionary_meta(),
download.dict(),
lma_patcat(),
lma_termcat(),
read.dic(),
report_term_matches()
Examples
# just retrieve information about available dictionaries
dicts <- select.dict()$info
dicts[1:10, 4:9]
# select all dictionaries mentioning sentiment or emotion
sentiment_dicts <- select.dict("sentiment emotion")$selected
sentiment_dicts[1:10, 4:9]