preprocess_string {occupationMeasurement}R Documentation

Preprocess a string, removing special characters and handling abbreviations.

Description

Replace some common characters / character sequences (e.g., Ä, Ü, "DIPL.-ING.") with their uppercase equivalents and removes punctuation, empty spaces and the word "Diplom".

Usage

preprocess_string(verbatim, lang = "de")

Arguments

verbatim

The character vector to process.

lang

The language the text is in. Currently only German is supported. Defaults to "de" (German).

Details

charToRaw() helps to find UTF-8 characters.

Value

The same character vector after processing

Examples



## Not run: 
preprocess_string(c(
  "Verkauf von B\u00fcchern, Schreibwaren",
  "Fach\u00e4rztin f\u00fcr Kinder- und Jugendmedizin im \u00f6ffentlichen Gesundheitswesen",
  "Industriemechaniker",
  "Dipl.-Ing. - Agrarwirtschaft (Landwirtschaft)"
))

## End(Not run)

[Package occupationMeasurement version 0.3.2 Index]