skipgram_append {eHDPrep}R Documentation

Append Skipgram Presence Variables to Dataset

Description

Adds new variables to data which report the presence of skipgrams (either those specified in skipgrams2append or, if not specified, skipgrams with a minimum frequency (min_freq, default = 1)).

Usage

skipgram_append(skipgram_tokens, skipgrams2append, data, id_var, min_freq = 1)

Arguments

skipgram_tokens

Output of skipgram_identify.

skipgrams2append

Which skipgrams in skipgram_tokens to append to dataset.

data

Data frame to append skipgram variables to.

id_var

An unquoted expression which corresponds to a variable in data which identifies each row.

min_freq

Minimum percentage frequency of skipgram occurrence to return. Default = 1.

Value

data with additional variables describing presence of skipgrams

References

Guthrie, D., Allison, B., Liu, W., Guthrie, L. & Wilks, Y. A Closer Look at Skip-gram Modelling. in Proceedings of the Fifth International Conference on Language Resources and Evaluation (LREC’06) (European Language Resources Association (ELRA), 2006).

Benoit K, Watanabe K, Wang H, Nulty P, Obeng A, Müller S, Matsuo A (2018). “quanteda: An R package for the quantitative analysis of textual data.” _Journal of Open Source Software_, *3*(30), 774. doi:10.21105/joss.00774 <https://doi.org/10.21105/joss.00774>, <https://quanteda.io>.

Feinerer I, Hornik K (2020). _tm: Text Mining Package_. R package version 0.7-8, <https://CRAN.R-project.org/package=tm>.

Ingo Feinerer, Kurt Hornik, and David Meyer (2008). Text Mining Infrastructure in R. Journal of Statistical Software 25(5): 1-54. URL: https://www.jstatsoft.org/v25/i05/.

See Also

Principle underlying function: tokens_ngrams

Other free text functions: extract_freetext(), skipgram_freq(), skipgram_identify()

Examples

data(example_data)
# identify skipgrams
toks_m <- skipgram_identify(x = example_data$free_text,
                            ids = example_data$patient_id,
                            max_interrupt_words = 5)
# add skipgrams by minimum frequency
skipgram_append(toks_m,
                id_var = patient_id,
                min_freq = 0.6,
                data = example_data)
# add specific skipgrams
skipgram_append(toks_m,
                id_var = patient_id,
                skipgrams2append = c("sixteen_week", "bad_strain"),
                data = example_data)

[Package eHDPrep version 1.2.1 Index]