extract_freetext {eHDPrep} | R Documentation |
Extract information from free text
Description
Extracts information from specified free text variables (...
) which
occur in a minimum amount of rows (min_freq
) and appends new variables
to data
.
Usage
extract_freetext(data, id_var, min_freq = 1, ...)
Arguments
data |
Data frame to append skipgram variables to. |
id_var |
An unquoted expression which corresponds to a variable in
|
min_freq |
Minimum percentage frequency of skipgram occurrence to return. Default = 1. |
... |
Unquoted expressions of free text variable names from which to extract information. |
Details
New variables report the presence of skipgrams (proximal words in the text)
with a minimum frequency (min_freq
, default = 1%)).
Value
data
with additional Boolean variables describing skipgrams in
...
References
Guthrie, D., Allison, B., Liu, W., Guthrie, L. & Wilks, Y. A Closer Look at Skip-gram Modelling. in Proceedings of the Fifth International Conference on Language Resources and Evaluation (LREC’06) (European Language Resources Association (ELRA), 2006).
Benoit K, Watanabe K, Wang H, Nulty P, Obeng A, Müller S, Matsuo A (2018). “quanteda: An R package for the quantitative analysis of textual data.” _Journal of Open Source Software_, *3*(30), 774. doi:10.21105/joss.00774 <https://doi.org/10.21105/joss.00774>, <https://quanteda.io>.
Feinerer I, Hornik K (2020). _tm: Text Mining Package_. R package version 0.7-8, <https://CRAN.R-project.org/package=tm>.
Ingo Feinerer, Kurt Hornik, and David Meyer (2008). Text Mining Infrastructure in R. Journal of Statistical Software 25(5): 1-54. URL: https://www.jstatsoft.org/v25/i05/.
See Also
Principle underlying function: tokens_ngrams
Other free text functions:
skipgram_append()
,
skipgram_freq()
,
skipgram_identify()
Examples
data(example_data)
extract_freetext(example_data, patient_id, min_freq = 0.6, free_text)