| etl {etl} | R Documentation |
Initialize an etl object
Description
Initialize an etl object
Usage
etl(x, db = NULL, dir = tempdir(), ...)
## Default S3 method:
etl(x, db = NULL, dir = tempdir(), ...)
## S3 method for class 'etl'
summary(object, ...)
is.etl(object)
## S3 method for class 'etl'
print(x, ...)
Arguments
x |
the name of the |
db |
a database connection that inherits from |
dir |
a directory to store the raw and processed data files |
... |
arguments passed to methods (currently ignored) |
object |
an object for which a summary is desired. |
Details
A constructor function that instantiates an etl object.
An etl object extends a src_dbi object.
It also has attributes for:
- pkg
the name of the
etlpackage corresponding to the data source- dir
the directory where the raw and processed data are stored
- raw_dir
the directory where the raw data files are stored
- load_dir
the directory where the processed data files are stored
Just like any src_dbi object, an etl object
is a data source backed by an SQL database. However, an etl object
has additional functionality based on the presumption that the SQL database
will be populated from data files stored on the local hard disk. The ETL functions
documented in etl_create provide the necessary functionality
for extracting data from the Internet to raw_dir,
transforming those data
and placing the cleaned up data (usually in CSV format) into load_dir,
and finally loading the clean data into the SQL database.
Value
For etl, an object of class etl_x and
etl that inherits
from src_dbi
For is.etl, TRUE or FALSE,
depending on whether x has class etl
See Also
Examples
# Instantiate the etl object
cars <- etl("mtcars")
str(cars)
is.etl(cars)
summary(cars)
## Not run:
# connect to a PostgreSQL server
if (require(RPostgreSQL)) {
db <- src_postgres("mtcars", user = "postgres", host = "localhost")
cars <- etl("mtcars", db)
}
## End(Not run)
# Do it step-by-step
cars %>%
etl_extract() %>%
etl_transform() %>%
etl_load()
src_tbls(cars)
cars %>%
tbl("mtcars") %>%
group_by(cyl) %>%
summarize(N = n(), mean_mpg = mean(mpg))
# Do it all in one step
cars2 <- etl("mtcars")
cars2 %>%
etl_update()
src_tbls(cars2)
# generic summary function provides information about the object
cars <- etl("mtcars")
summary(cars)
cars <- etl("mtcars")
# returns TRUE
is.etl(cars)
# returns FALSE
is.etl("hello world")
cars <- etl("mtcars") %>%
etl_create()
cars