src_impala {implyr}R Documentation

Connect to Impala and create a remote dplyr data source

Description

src_impala creates a SQL backend to dplyr for Apache Impala, the massively parallel processing query engine for Apache Hadoop.

src_impala can work with any DBI-compatible interface that provides connectivity to Impala. Currently, two packages that can provide this connectivity are odbc and RJDBC.

Usage

src_impala(drv, ..., auto_disconnect = TRUE)

Arguments

drv

an object that inherits from DBIDriver-class. For example, an object returned by odbc or JDBC

...

arguments passed to the underlying Impala database connection method dbConnect. See dbConnect,OdbcDriver-method or dbConnect,JDBCDriver-method

auto_disconnect

Should the connection to Impala be automatically closed when the object returned by this function is deleted? Pass NA to auto-disconnect but print a message when this happens.

Value

An object with class src_impala, src_sql, src

See Also

Impala ODBC driver, Impala JDBC driver

Examples

# Using ODBC connectivity:

## Not run: 
library(odbc)
drv <- odbc::odbc()
impala <- src_impala(
  drv = drv,
  driver = "Cloudera ODBC Driver for Impala",
  host = "host",
  port = 21050,
  database = "default",
  uid = "username",
  pwd = "password"
)
## End(Not run)

# Using JDBC connectivity:

## Not run: 
library(RJDBC)
Sys.setenv(JAVA_HOME = "/path/to/java/home/")
impala_classpath <- list.files(
  path = "/path/to/jdbc/driver",
  pattern = "\\.jar$",
  full.names = TRUE
)
.jinit(classpath = impala_classpath)
drv <- JDBC(
  driverClass = "com.cloudera.impala.jdbc41.Driver",
  classPath = impala_classpath,
  identifier.quote = "`"
)
impala <- src_impala(
  drv,
  "jdbc:impala://host:21050",
  "username",
  "password"
)
## End(Not run)

[Package implyr version 0.5.0 Index]