text_to_sentences {ds4psy} | R Documentation |
Split strings of text x
into sentences.
Description
text_to_sentences
splits text x
(consisting of one or more character strings)
into a vector of its constituting sentences.
Usage
text_to_sentences(
x,
sep = " ",
split_delim = "\\.|\\?|!",
force_delim = FALSE
)
Arguments
x |
A string of text (required), typically a character vector. |
sep |
A character inserted as separator/delimiter
between elements when collapsing multi-element strings of |
split_delim |
Sentence delimiters (as regex)
used to split the collapsed string of |
force_delim |
Boolean: Enforce splitting at |
Details
The splits of x
will occur at given punctuation marks
(provided as a regular expression, default: split_delim = "\.|\?|!"
).
Empty leading and trailing spaces are removed before returning
a vector of the remaining character sequences (i.e., the sentences).
The Boolean argument force_delim
distinguishes between
two splitting modes:
If
force_delim = FALSE
(as per default), a standard sentence-splitting pattern is assumed: A sentence delimiter insplit_delim
must be followed by one or more blank spaces and a capital letter starting the next sentence. Sentence delimiters insplit_delim
are not removed from the output.If
force_delim = TRUE
, the function enforces splits at each delimiter insplit_delim
. For instance, any dot (i.e., the metacharacter"\."
) is interpreted as a full stop, so that sentences containing dots mid-sentence (e.g., for abbreviations, etc.) are split into parts. Sentence delimiters insplit_delim
are removed from the output.
Internally, text_to_sentences
first uses paste
to collapse strings (adding sep
between elements) and then
strsplit
to split strings at split_delim
.
Value
A character vector (of sentences).
See Also
text_to_words
for splitting text into a vector of words;
text_to_chars
for splitting text into a vector of characters;
count_words
for counting the frequency of words;
strsplit
for splitting strings.
Other text objects and functions:
Umlaut
,
capitalize()
,
caseflip()
,
cclass
,
chars_to_text()
,
collapse_chars()
,
count_chars_words()
,
count_chars()
,
count_words()
,
invert_rules()
,
l33t_rul35
,
map_text_chars()
,
map_text_coord()
,
map_text_regex()
,
metachar
,
read_ascii()
,
text_to_chars()
,
text_to_words()
,
transl33t()
,
words_to_text()
Examples
x <- c("A first sentence. Exclamation sentence!",
"Any questions? But etc. can be tricky. A fourth --- and final --- sentence.")
text_to_sentences(x)
text_to_sentences(x, force_delim = TRUE)
# Changing split delimiters:
text_to_sentences(x, split_delim = "\\.") # only split at "."
text_to_sentences("Buy apples, berries, and coconuts.")
text_to_sentences("Buy apples, berries; and coconuts.",
split_delim = ",|;|\\.", force_delim = TRUE)
text_to_sentences(c("123. 456? 789! 007 etc."), force_delim = TRUE)
# Split multi-element strings (w/o punctuation):
e3 <- c("12", "34", "56")
text_to_sentences(e3, sep = " ") # Default: Collapse strings adding 1 space, but:
text_to_sentences(e3, sep = ".", force_delim = TRUE) # insert sep and force split.
# Punctuation within sentences:
text_to_sentences("Dr. who is left intact.")
text_to_sentences("Dr. Who is problematic.")