src_drill {sergeant}R Documentation

Connect to Drill (dplyr)

Description

Use src_drill() to connect to a Drill cluster and tbl() to connect to a fully-qualified "table reference". The vast majority of Drill SQL functions have also been made available to the dplyr interface. If you have custom Drill SQL functions that need to be implemented please file an issue on GitHub.

Usage

src_drill(
  host = Sys.getenv("DRILL_HOST", "localhost"),
  port = as.integer(Sys.getenv("DRILL_PORT", 8047L)),
  ssl = FALSE,
  username = NULL,
  password = NULL
)

## S3 method for class 'src_drill'
tbl(src, from, ...)

Arguments

host

Drill host (will pick up the value from DRILL_HOST env var)

port

Drill port (will pick up the value from DRILL_PORT env var)

ssl

use ssl?

username, password

if not NULL the credentials for the Drill service.

src

A Drill "src" created with src_drill()

from

A Drill view or table specification

...

Extra parameters

Note

This is a DBI wrapper around the Drill REST API.

See Also

Other Drill REST API (dplyr): drill_custom_functions, src_tbls.src_drill()

Other Drill REST API (dplyr): drill_custom_functions, src_tbls.src_drill()

Examples

try({
db <- src_drill("localhost", 8047L)

print(db)
## src:  DrillConnection
## tbls: INFORMATION_SCHEMA, cp.default, dfs.default, dfs.root, dfs.tmp, sys

emp <- tbl(db, "cp.`employee.json`")

count(emp, gender, marital_status)
## # Source:   lazy query [?? x 3]
## # Database: DrillConnection
## # Groups:   gender
##   marital_status gender     n
##            <chr>  <chr> <int>
## 1              S      F   297
## 2              M      M   278
## 3              S      M   276

# Drill-specific SQL functions are also available
select(emp, full_name) %>%
  mutate(        loc = strpos(full_name, "a"),
         first_three = substr(full_name, 1L, 3L),
                 len = length(full_name),
                  rx = regexp_replace(full_name, "[aeiouAEIOU]", "*"),
                 rnd = rand(),
                 pos = position("en", full_name),
                 rpd = rpad(full_name, 20L),
                rpdw = rpad_with(full_name, 20L, "*"))
## # Source:   lazy query [?? x 9]
## # Database: DrillConnection
##      loc         full_name   len                 rpdw   pos                rx
##    <int>             <chr> <int>                <chr> <int>             <chr>
##  1     0      Sheri Nowmer    12 Sheri Nowmer********     0      Sh*r* N*wm*r
##  2     0   Derrick Whelply    15 Derrick Whelply*****     0   D*rr*ck Wh*lply
##  3     5    Michael Spence    14 Michael Spence******    11    M*ch**l Sp*nc*
##  4     2    Maya Gutierrez    14 Maya Gutierrez******     0    M*y* G*t**rr*z
##  5     7   Roberta Damstra    15 Roberta Damstra*****     0   R*b*rt* D*mstr*
##  6     7  Rebecca Kanagaki    16 Rebecca Kanagaki****     0  R*b*cc* K*n*g*k*
##  7     0       Kim Brunner    11 Kim Brunner*********     0       K*m Br*nn*r
##  8     6   Brenda Blumberg    15 Brenda Blumberg*****     3   Br*nd* Bl*mb*rg
##  9     2      Darren Stanz    12 Darren Stanz********     5      D*rr*n St*nz
## 10     4 Jonathan Murraiin    17 Jonathan Murraiin***     0 J*n*th*n M*rr***n
## # ... with more rows, and 3 more variables: rpd <chr>, rnd <dbl>, first_three <chr>
}, silent=TRUE)

[Package sergeant version 0.9.1 Index]