R: Simple wrapper to convert Athena backend file types

dbConvertTable {RAthena}

R Documentation

Simple wrapper to convert Athena backend file types

Description

Utilises AWS Athena to convert AWS S3 backend file types. It also also to create more efficient file types i.e. "parquet" and "orc" from SQL queries.

Usage

dbConvertTable(conn, obj, name, ...)

## S4 method for signature 'AthenaConnection'
dbConvertTable(
  conn,
  obj,
  name,
  partition = NULL,
  s3.location = NULL,
  file.type = c("NULL", "csv", "tsv", "parquet", "json", "orc"),
  compress = TRUE,
  data = TRUE,
  ...
)

Arguments

`conn`	An `AthenaConnection` object, produced by [DBI::dbConnect()]
`obj`	Athena table or `SQL` DML query to be converted. For `SQL`, the query need to be wrapped with `DBI::SQL()` and follow AWS Athena DML format link
`name`	Name of destination table
`...`	Extra parameters, currently not used
`partition`	Partition Athena table
`s3.location`	location to store output file, must be in s3 uri format for example ("s3://mybucket/data/").
`file.type`	File type for `name`, currently support ["NULL","csv", "tsv", "parquet", "json", "orc"]. `"NULL"` will let Athena set the file type for you.
`compress`	Compress `name`, currently can only compress ["parquet", "orc"] (AWS Athena CTAS)
`data`	If `name` should be created with data or not.

Value

dbConvertTable() returns TRUE but invisible.

Examples

## Not run: 
# Note:
# - Require AWS Account to run below example.
# - Different connection methods can be used please see `RAthena::dbConnect` documnentation

library(DBI)
library(RAthena)

# Demo connection to Athena using profile name
con <- dbConnect(athena())

# write iris table to Athena in defualt delimited format
dbWriteTable(con, "iris", iris)

# convert delimited table to parquet
dbConvertTable(con,
  obj = "iris",
  name = "iris_parquet",
  file.type = "parquet"
)

# Create partitioned table from non-partitioned
# iris table using SQL DML query
dbConvertTable(con,
  obj = SQL("select
                            iris.*,
                            date_format(current_date, '%Y%m%d') as time_stamp
                          from iris"),
  name = "iris_orc_partitioned",
  file.type = "orc",
  partition = "time_stamp"
)

# disconnect from Athena
dbDisconnect(con)

## End(Not run)

[Package RAthena version 2.6.1 Index]