preprocess_string {occupationMeasurement} | R Documentation |
Preprocess a string, removing special characters and handling abbreviations.
Description
Replace some common characters / character sequences (e.g., Ä, Ü, "DIPL.-ING.") with their uppercase equivalents and removes punctuation, empty spaces and the word "Diplom".
Usage
preprocess_string(verbatim, lang = "de")
Arguments
verbatim |
The character vector to process. |
lang |
The language the text is in. Currently only German is supported. Defaults to "de" (German). |
Details
charToRaw()
helps to find UTF-8 characters.
Value
The same character vector after processing
Examples
## Not run:
preprocess_string(c(
"Verkauf von B\u00fcchern, Schreibwaren",
"Fach\u00e4rztin f\u00fcr Kinder- und Jugendmedizin im \u00f6ffentlichen Gesundheitswesen",
"Industriemechaniker",
"Dipl.-Ing. - Agrarwirtschaft (Landwirtschaft)"
))
## End(Not run)
[Package occupationMeasurement version 0.3.2 Index]