etl {etl} | R Documentation |
Initialize an etl
object
Description
Initialize an etl
object
Usage
etl(x, db = NULL, dir = tempdir(), ...)
## Default S3 method:
etl(x, db = NULL, dir = tempdir(), ...)
## S3 method for class 'etl'
summary(object, ...)
is.etl(object)
## S3 method for class 'etl'
print(x, ...)
Arguments
x |
the name of the |
db |
a database connection that inherits from |
dir |
a directory to store the raw and processed data files |
... |
arguments passed to methods (currently ignored) |
object |
an object for which a summary is desired. |
Details
A constructor function that instantiates an etl
object.
An etl
object extends a src_dbi
object.
It also has attributes for:
- pkg
the name of the
etl
package corresponding to the data source- dir
the directory where the raw and processed data are stored
- raw_dir
the directory where the raw data files are stored
- load_dir
the directory where the processed data files are stored
Just like any src_dbi
object, an etl
object
is a data source backed by an SQL database. However, an etl
object
has additional functionality based on the presumption that the SQL database
will be populated from data files stored on the local hard disk. The ETL functions
documented in etl_create
provide the necessary functionality
for extracting data from the Internet to raw_dir
,
transforming those data
and placing the cleaned up data (usually in CSV format) into load_dir
,
and finally loading the clean data into the SQL database.
Value
For etl
, an object of class etl_x
and
etl
that inherits
from src_dbi
For is.etl
, TRUE
or FALSE
,
depending on whether x
has class etl
See Also
Examples
# Instantiate the etl object
cars <- etl("mtcars")
str(cars)
is.etl(cars)
summary(cars)
## Not run:
# connect to a PostgreSQL server
if (require(RPostgreSQL)) {
db <- src_postgres("mtcars", user = "postgres", host = "localhost")
cars <- etl("mtcars", db)
}
## End(Not run)
# Do it step-by-step
cars %>%
etl_extract() %>%
etl_transform() %>%
etl_load()
src_tbls(cars)
cars %>%
tbl("mtcars") %>%
group_by(cyl) %>%
summarize(N = n(), mean_mpg = mean(mpg))
# Do it all in one step
cars2 <- etl("mtcars")
cars2 %>%
etl_update()
src_tbls(cars2)
# generic summary function provides information about the object
cars <- etl("mtcars")
summary(cars)
cars <- etl("mtcars")
# returns TRUE
is.etl(cars)
# returns FALSE
is.etl("hello world")
cars <- etl("mtcars") %>%
etl_create()
cars