removePunctuation {tm} | R Documentation |
Remove Punctuation Marks from a Text Document
Description
Remove punctuation marks from a text document.
Usage
## S3 method for class 'character'
removePunctuation(x,
preserve_intra_word_contractions = FALSE,
preserve_intra_word_dashes = FALSE,
ucp = FALSE, ...)
## S3 method for class 'PlainTextDocument'
removePunctuation(x, ...)
Arguments
x |
a character vector or text document. |
preserve_intra_word_contractions |
a logical specifying whether intra-word contractions should be kept. |
preserve_intra_word_dashes |
a logical specifying whether intra-word dashes should be kept. |
ucp |
a logical specifying whether to use Unicode character
properties for determining punctuation characters. If |
... |
arguments to be passed to or from methods;
in particular, from the |
Value
The character or text document x
without punctuation marks
(besides intra-word contractions (‘'’) and intra-word dashes
(‘-’) if preserve_intra_word_contractions
and
preserve_intra_word_dashes
are set, respectively).
See Also
getTransformations
to list available transformation
(mapping) functions.
regex
shows the class [:punct:]
of punctuation
characters.
https://unicode.org/reports/tr44/#General_Category_Values.
Examples
data("crude")
inspect(crude[[14]])
inspect(removePunctuation(crude[[14]]))
inspect(removePunctuation(crude[[14]],
preserve_intra_word_contractions = TRUE,
preserve_intra_word_dashes = TRUE))