matchQuote {Ecfun} | R Documentation |
Match isolated quotes across records
Description
Look for unmatched quotes in a character vector. If found, look for a matching quote starting the next character string in the vector, possibly after a blank line. If found, merge the two strings and return the resulting shortened character vector.
Usage
matchQuote(x, Quote='"', sep=' ',
maxChars2append=2, ...)
Arguments
x |
a character vector to scan for unmatched
|
Quote |
the |
sep |
|
maxChars2append |
maximum number of characters in the
following string to concatenate two
adjacent strings (possibly separated by
a blank line) with unmatched
|
... |
optional arguments for
|
Details
This function was written to help parse
data from the US Department of Health and
Human Services on
cyber-security breaches affecting 500 or
more individuals. As of 2014-06-03 the
csv
version of these data included
commas in quotes that are not sep
characters, quotes that are not matched,
lines with zero characters, followed by
lines with 3 characters being a quote and
a comma. This function was written
to drop the blank lines and append the
quote-comma line to the preceding line so
it contained matching quotes.
Value
The input character vector possibly shortened with the following attributes explaining what was found:
indices of the input x
with
an unmatched
unmatchedQuotes
Quote
.blankLinesDropped
indices of the inputx
that were dropped because they (1) followed an unmatchedQuote
and (2) contained no non-blank characters.quoteLinesAppended
indices of the inputx
that were concatenated with a preceding line because the two lines contained unmatchedQuote
characters, and concatenating them produced a line with allQuote
s matched.ncharsAppended
an integer vector of the same length asquoteLinesConcatenated
giving the number of characters in the second line concatenated onto the previous line.
Author(s)
Spencer Graves
See Also
Examples
chvec <- c('abc', 'de"f', ' ', '",', 'g"h',
'matched"quotes"', '')
ch. <- matchQuote(chvec)
# check
chv. <- c('abc', 'de"f ",', 'g"h',
'matched"quotes"', '')
attr(chv., 'unmatchedQuotes') <- c(2, 4, 5)
attr(chv., 'blankLinesDropped') <- 3
attr(chv., 'quoteLinesAppended') <- 4
attr(chv., 'ncharsAppended') <- 2
all.equal(ch., chv.)