R: Read data from a JDBC connection into an ffdf.

read.jdbc.ffdf {ETLUtils}

R Documentation

Read data from a JDBC connection into an ffdf.

Description

Read data from a JDBC connection into an ffdf. This can for example be used to import large datasets from Oracle, SQLite, MySQL, PostgreSQL, Hive or other SQL databases into R.

Usage

read.jdbc.ffdf(
  query = NULL,
  dbConnect.args = list(drv = NULL, dbname = NULL, username = "", password = ""),
  dbSendQuery.args = list(),
  dbFetch.args = list(),
  x = NULL,
  nrows = -1,
  first.rows = NULL,
  next.rows = NULL,
  levels = NULL,
  appendLevels = TRUE,
  asffdf_args = list(),
  BATCHBYTES = getOption("ffbatchbytes"),
  VERBOSE = FALSE,
  colClasses = NULL,
  transFUN = NULL,
  ...
)

Arguments

`query`	the SQL query to execute on the JDBC connection
`dbConnect.args`	a list of arguments to pass to JDBC's `RJDBC::dbConnect` (like drv, dbname, username, password). See the examples.
`dbSendQuery.args`	a list containing database-specific parameters which will be passed to to pass to `RJDBC::dbSendQuery`. Defaults to an empty list.
`dbFetch.args`	a list containing optional database-specific parameters which will be passed to to pass to `RJDBC::dbFetch`. Defaults to an empty list.
`x`	NULL or an optional ffdf object to which the read records are appended. See documentation in read.table.ffdf for more details and the example below.
`nrows`	Number of rows to read from the query resultset. Default value of -1 reads in all rows.
`first.rows`	chunk size (rows) to read for first chunk from the query resultset
`next.rows`	chunk size (rows) to read sequentially for subsequent chunks from the query resultset. Currently, this must be specified.
`levels`	optional specification of factor levels. A list with as names the names the columns of the data.frame fetched in the first.rows, containing levels of the factors.
`appendLevels`	logical. A vector of permissions to expand levels for factor columns. See documentation in `read.table.ffdf` for more details.
`asffdf_args`	further arguments passed to `as.ffdf` (ignored if 'x' gives an ffdf object )
`BATCHBYTES`	integer: bytes allowed for the size of the data.frame storing the result of reading one chunk. See documentation in `read.table.ffdf` for more details.
`VERBOSE`	logical: TRUE to verbose timings for each processed chunk (default FALSE).
`colClasses`	See documentation in `read.table.ffdf`
`transFUN`	function applied to the data frame after each chunk is retreived by `RJDBC::dbFetch`
`...`	optional parameters passed on to transFUN

Details

Opens up the JDBC connection using RJDBC::dbConnect, sends the query using RJDBC::dbSendQuery and RJDBC::dbFetch-es the results in batches of next.rows rows. Heavily borrowed from read.table.ffdf

Value

An ffdf object unless the query returns zero records in which case the function will return the data.frame returned by RJDBC::dbFetch and possibly transFUN.

Examples

## Not run: 
require(ff)

##
## Example query using data in sqlite
##
require(RSQLite)
dbfile <- system.file("smalldb.sqlite3", package="ETLUtils")
drv <- JDBC(driverClass = "org.sqlite.JDBC", classPath = "/usr/local/lib/sqlite-jdbc-3.7.2.jar")
query <- "select * from testdata limit 10000"
x <- read.jdbc.ffdf(query = query, 
 dbConnect.args = list(drv = drv, url = sprintf("jdbc:sqlite:%s", dbfile)), 
 first.rows = 100, next.rows = 1000, VERBOSE=TRUE)
class(x)
x[1:10, ]

## End(Not run)

[Package ETLUtils version 1.5 Index]