asNumericDF {Ecfun} | R Documentation |
Coerce to numeric dropping commas and info after a blank
Description
For asNumericChar
, delete leading
blanks and a leading dollar sign plus
commas (thousand separators) and drop
information after a blank (other than
leading blanks), then coerce to numeric or
to factors, Dates
, or
POSIXct
as desired.
For a data.frame
, apply
asNumericChar
to all columns and
drop columns except those in keep
,
ignore
, factors
,
Dates
, POSIX
and
MSdates
.
Then order the rows by the
orderBy
column. Some Excel
imports include commas as thousand
separators; this replaces any commas
with char(0), ”, before trying to
convert to numeric.
Similarly, if "%" is found as the last character in any field, drop the percent sign and divide the resulting numeric conversion by 100 to convert to proportion.
Also, some character data includes footnote references following the year.
For example Table F-1 from the US Census
Bureau needs all three of these numeric
conversion features: It needs
orderBy
, because the most recent
year appears first, just the opposite of
most other data sets where the most
recent year appears last. It has
footnote references following a character
string indicating the year. And it
includes commas as thousand separators.
Usage
asNumericChar(x, leadingChar='^\\$',
suppressChar=',', pctChar='%$',
class.=NULL, format.=NULL)
asNumericDF(x, keep=
function(x)any(!is.na(x)),
orderBy=NA, ignore=NULL, factors=NULL,
Dates=NULL, POSIX=NULL, MSdates=NULL,
format.=NULL, leadingChar='^\\$',
suppressChar=',', pctChar='%$')
Arguments
x |
For For |
keep |
something to indicate which columns to
keep, in addition to columns specified
in |
orderBy |
Which columns to order the rows of
|
ignore |
vector identifying columns of |
factors |
vector indicating columns of |
Dates |
vector indicating columns of |
POSIX |
vector indicating columns of |
class. |
Desired class of output. Default is
|
format. |
Character vector of length 1 to pass
as argument For |
MSdates |
The names or numbers identifying columns of x identifying dates as integer numbers of days since 1899-12-31. In Microsoft Excel, dates are stored in that format. |
leadingChar |
A regular expression passed to
|
suppressChar |
a regular expression passed to
|
pctChar |
A regular expression passed to
|
Details
For asNumericChar
:
1. Replace commas by nothing
2. strsplit
on ' ' and take only
the first part, thereby eliminating the
footnote references.
3. Replace any blanks with NAs
4. as.numeric
for asNumericDF
:
1. Copy x
to X
.
2. Confirm that ignore
,
factors
, Dates
, and
POSIX
all refer to columns
of x
and do not overlap.
[*** NOTE: as of 2016-07-21, these
checks have only been implemented
for ignore
.]
3. Convert factors
, Dates
,
and POSIX
.
4. Apply asNumericChar
to all
columns not in ignore
,
factors
, Dates
, or
POSIX
.
5. Keep columns specified by
keep
.
6. return the result.
Value
Author(s)
Spencer Graves
References
"Add (sum) or subtract dates; Applies To: Excel 2013". Microsoft. (accessed 2016-08-11)
See Also
scan
gsub
Quotes
stripBlanks
as.numeric
,
factor
,
as.Date
,
as.POSIXct
read.xlsx
Examples
##
## 1. an example
##
(xDate <- as.Date('1970-01-01')+c(0, 365))
(xPOSIX <- as.POSIXct(xDate)+c(1, 99))
xMSdate <- as.Date(1,
as.Date('1899-12-31'))+1:2
(fakeF1 <- data.frame(yr=c('1948',
'1947 (1)'),
q1=c(' 1,234 ', ''), duh=rep(NA, 2),
dol=c('$1,234', ''),
pct=c('1%', '2%'),
xDate=as.character(xDate,
format='%Y-%m-%d'),
xPOSIX=as.character(xPOSIX,
format='%Y-%m-%d %H:%M:%S'),
xMSdate=2:3, junk=c('this is',
'junk')))
# This converts the last 3 columns to NAs and drops them:
str(nF1.1 <- asNumericChar(fakeF1$yr))
str(nF1.2 <- asNumericChar(fakeF1$q1))
str(nF1.3 <- asNumericChar(fakeF1$duh))
(nF1.4 <- asNumericChar('1969-12-31 18:00:01',
class.='POSIXct'))
(nF1 <- asNumericDF(fakeF1))
(nF2 <- asNumericDF(fakeF1, Dates=6,
MSdate='xMSdate',
ignore=c('junk', 'xPOSIX'),
format.='%Y-%m-%d'))
# check
nF1. <- data.frame(yr=
asNumericChar(fakeF1$yr),
q1=asNumericChar(fakeF1$q1),
dol=asNumericChar(fakeF1$dol),
pct=c(.01, .02), xMSdate=2:3)
nF1c <- data.frame(yr=1948:1947,
q1=c(1234, NA), dol=c(1234, NA),
pct=c(.01, .02), xMSdate=2:3)
all.equal(nF1, nF1.)
all.equal(nF1., nF1c)
##
## 2. as.Date default example
##
xD <- asNumericChar(
as.character(xDate), class.='Date')
all.equal(xDate, xD)
##
## 3. as.POSIXct default example
##
xPOSIX
(xPOSIXch <- as.character(xPOSIX))
(xP <- asNumericChar(xPOSIXch, class.='POSIXct'))
attr(xPOSIX, 'tzone')
attr(xP, 'tzone')
# R-Devel after 4.2.1 breaks earlier code; fix
if(is.null(attr(xPOSIX, 'tzone')))
attr(xPOSIX, 'tzone') <- attr(xP, 'tzone')
(dP <- difftime(xPOSIX, xP, units='secs'))
(madP <- max(abs(as.numeric(dP))))
{
#all.equal(xPOSIX, xP)
# As of 2022-10-06 I don't know how to write code
# that will get a consistent answer with
# different version R-devel with differences
# less than an hour
if(madP>3600){
stop('Discrepancy betw fn and manual comp ',
'too large.')
}
TRUE
}
##
## 4. orderBy=1:2
##
nF. <- asNumericDF(fakeF1, orderBy=1:2)
all.equal(nF., nF1c[2:1,])
##
## 5. Will it work for a tibble?
##
if(require(tibble)){
nF1t <- asNumericDF(as_tibble(fakeF1))
all.equal(nF1, nF1t)
}