| sedit {Hmisc} | R Documentation |
Character String Editing and Miscellaneous Character Handling Functions
Description
This suite of functions was written to implement many of the features
of the UNIX sed program entirely within S (function sedit).
The substring.location function returns the first and last position
numbers that a sub-string occupies in a larger string. The substring2<-
function does the opposite of the builtin function substring.
It is named substring2 because for S-Plus there is a built-in
function substring, but it does not handle multiple replacements in
a single string.
replace.substring.wild edits character strings in the fashion of
"change xxxxANYTHINGyyyy to aaaaANYTHINGbbbb", if the "ANYTHING"
passes an optional user-specified test function. Here, the
"yyyy" string is searched for from right to left to handle
balancing parentheses, etc. numeric.string
and all.digits are two examples of test functions, to check,
respectively if each of a vector of strings is a legal numeric or if it contains only
the digits 0-9. For the case where old="*$" or "^*", or for
replace.substring.wild with the same values of old or with
front=TRUE or back=TRUE, sedit (if wild.literal=FALSE) and
replace.substring.wild will edit the largest substring
satisfying test.
substring2 is just a copy of substring so that
substring2<- will work.
Usage
sedit(text, from, to, test, wild.literal=FALSE)
substring.location(text, string, restrict)
# substring(text, first, last) <- setto # S-Plus only
replace.substring.wild(text, old, new, test, front=FALSE, back=FALSE)
numeric.string(string)
all.digits(string)
substring2(text, first, last)
substring2(text, first, last) <- value
Arguments
text |
a vector of character strings for |
from |
a vector of character strings to translate from, for |
to |
a vector of character strings to translate to, for |
string |
a single character string, for |
first |
a vector of integers specifying the first position to replace for
|
last |
a vector of integers specifying the ending positions of the character
substrings to be replaced. The default is to go to the end of
the string. When |
setto |
a character string or vector of character strings used as replacements,
in |
old |
a character string to translate from for |
new |
a character string to translate to for |
test |
a function of a vector of character strings returning a logical vector
whose elements are |
wild.literal |
set to |
restrict |
a vector of two integers for |
front |
specifying |
back |
specifying |
value |
a character vector |
Value
sedit returns a vector of character strings the same length as text.
substring.location returns a list with components named first
and last, each specifying a vector of character positions corresponding
to matches. replace.substring.wild returns a single character string.
numeric.string and all.digits return a single logical value.
Side Effects
substring2<- modifies its first argument
Author(s)
Frank Harrell
Department of Biostatistics
Vanderbilt University School of Medicine
fh@fharrell.com
See Also
Examples
x <- 'this string'
substring2(x, 3, 4) <- 'IS'
x
substring2(x, 7) <- ''
x
substring.location('abcdefgabc', 'ab')
substring.location('abcdefgabc', 'ab', restrict=c(3,999))
replace.substring.wild('this is a cat','this*cat','that*dog')
replace.substring.wild('there is a cat','is a*', 'is not a*')
replace.substring.wild('this is a cat','is a*', 'Z')
qualify <- function(x) x==' 1.5 ' | x==' 2.5 '
replace.substring.wild('He won 1.5 million $','won*million',
'lost*million', test=qualify)
replace.substring.wild('He won 1 million $','won*million',
'lost*million', test=qualify)
replace.substring.wild('He won 1.2 million $','won*million',
'lost*million', test=numeric.string)
x <- c('a = b','c < d','hello')
sedit(x, c('=','he*o'),c('==','he*'))
sedit('x23', '*$', '[*]', test=numeric.string)
sedit('23xx', '^*', 'Y_{*} ', test=all.digits)
replace.substring.wild("abcdefabcdef", "d*f", "xy")
x <- "abcd"
substring2(x, "bc") <- "BCX"
x
substring2(x, "B*d") <- "B*D"
x