read.dbi.ffdf {ETLUtils} | R Documentation |
Read data from a DBI connection into an ffdf.
Description
Read data from a DBI connection into an ffdf
. This can for example be used to import
large datasets from Oracle, SQLite, MySQL, PostgreSQL, Hive or other SQL databases into R.
Usage
read.dbi.ffdf(
query = NULL,
dbConnect.args = list(drv = NULL, dbname = NULL, username = "", password = ""),
dbSendQuery.args = list(),
dbFetch.args = list(),
x = NULL,
nrows = -1,
first.rows = NULL,
next.rows = NULL,
levels = NULL,
appendLevels = TRUE,
asffdf_args = list(),
BATCHBYTES = getOption("ffbatchbytes"),
VERBOSE = FALSE,
colClasses = NULL,
transFUN = NULL,
...
)
Arguments
query |
the SQL query to execute on the DBI connection |
dbConnect.args |
a list of arguments to pass to DBI's |
dbSendQuery.args |
a list containing database-specific parameters which will be passed to to pass to |
dbFetch.args |
a list containing optional database-specific parameters which will be passed to to pass to |
x |
NULL or an optional ffdf object to which the read records are appended. See documentation in read.table.ffdf for more details and the example below. |
nrows |
Number of rows to read from the query resultset. Default value of -1 reads in all rows. |
first.rows |
chunk size (rows) to read for first chunk from the query resultset |
next.rows |
chunk size (rows) to read sequentially for subsequent chunks from the query resultset. Currently, this must be specified. |
levels |
optional specification of factor levels. A list with as names the names the columns of the data.frame fetched in the first.rows, containing levels of the factors. |
appendLevels |
logical. A vector of permissions to expand levels for factor columns. See documentation in |
asffdf_args |
further arguments passed to |
BATCHBYTES |
integer: bytes allowed for the size of the data.frame storing the result of reading one chunk.
See documentation in |
VERBOSE |
logical: TRUE to verbose timings for each processed chunk (default FALSE). |
colClasses |
See documentation in |
transFUN |
function applied to the data frame after each chunk is retreived by |
... |
optional parameters passed on to transFUN |
Details
Opens up the DBI connection using DBI::dbConnect
, sends the query using DBI::dbSendQuery
and DBI::dbFetch
-es
the results in batches of next.rows rows. Heavily borrowed from read.table.ffdf
Value
An ffdf object unless the query returns zero records in which case the function will return the data.frame
returned by dbFetch
and possibly transFUN.
See Also
read.table.ffdf, read.odbc.ffdf
Examples
require(ff)
##
## Example query using data in sqlite
##
require(RSQLite)
dbfile <- system.file("smalldb.sqlite3", package="ETLUtils")
drv <- dbDriver("SQLite")
query <- "select * from testdata limit 10000"
x <- read.dbi.ffdf(query = query, dbConnect.args = list(drv = drv, dbname = dbfile),
first.rows = 100, next.rows = 1000, VERBOSE=TRUE)
class(x)
x[1:10, ]
## show it is the same as getting the data directly using RSQLite
## apart from characters which are factors in ffdf objects
directly <- dbGetQuery(dbConnect(drv = drv, dbname = dbfile), query)
directly <- as.data.frame(as.list(directly), stringsAsFactors=TRUE)
all.equal(x[,], directly)
## show how to use the transFUN argument to transform the data before saving into the ffdf
## and shows the use of the levels argument
query <- "select * from testdata limit 10"
x <- read.dbi.ffdf(query = query, dbConnect.args = list(drv = drv, dbname = dbfile),
first.rows = 100, next.rows = 1000, VERBOSE=TRUE, levels = list(a = rev(LETTERS)),
transFUN = function(x, subtractdays){
x$b <- as.Date(x$b)
x$b.subtractdaysago <- x$b - subtractdays
x
}, subtractdays=7)
class(x)
x[1:10, ]
## remark that the levels of column a are reversed due to specifying the levels argument correctly
levels(x$a)
## show how to append data to an existing ffdf object
transformexample <- function(x, subtractdays){
x$b <- as.Date(x$b)
x$b.subtractdaysago <- x$b - subtractdays
x
}
dim(x)
x[,]
combined <- read.dbi.ffdf(query = query,
dbConnect.args = list(drv = drv, dbname = dbfile),
first.rows = 100, next.rows = 1000, x = x, VERBOSE=TRUE,
transFUN = transformexample, subtractdays=1000)
dim(combined)
combined[,]
##
## Example query using ROracle. Do try this at home with some larger data :)
##
## Not run:
require(ROracle)
query <- "select OWNER, TABLE_NAME, TABLESPACE_NAME, NUM_ROWS, LAST_ANALYZED from all_all_tables"
x <- read.dbi.ffdf(query=query,
dbConnect.args = list(drv = dbDriver("Oracle"),
user = "YourUser", password = "YourPassword", dbname = "Mydatabase"),
first.rows = 100, next.rows = 50000, nrows = -1, VERBOSE=TRUE)
## End(Not run)