db_lselect {stevemisc} | R Documentation |
Lazily select variables from multiple tables in a relational database
Description
db_lselect()
allows you to select variables from multiple
tables in an SQL database. It returns a lazy query that combines all the
variables together into one data frame (as a tibble
). The user can
choose to run collect()
after this query if they see fit.
Usage
db_lselect(.data, connection, vars)
Arguments
.data |
a character vector of the tables in a relational database |
connection |
the name of the connection object |
vars |
the variables (entered as class "character") to select from the tables in the database |
Details
This is a wrapper function in which purrr and dplyr
are doing the heavy lifting. The tables in the database are declared as a
character (or character vector). The variables to select are also declared
as a character (or character vector), which are then wrapped in a
one_of()
function within select()
in dplyr.
Value
Assuming a particular structure to the database, the function returns a
combined table including all the requested variables from all the tables listed
in the data
character vector. The returned table will have other attributes
inherited from how dplyr interfaces with SQL, allowing the user to extract
some information about the query (e.g. through show_query()
).
References
Miller, Steven V. 2020. "Clever Uses of Relational (SQL) Databases to Store Your Wider Data (with Some Assistance from dplyr
and purrr
)" http://svmiller.com/blog/2020/11/smarter-ways-to-store-your-wide-data-with-sql-magic-purrr/
Examples
library(DBI)
library(RSQLite)
library(dplyr)
library(dbplyr)
set.seed(8675309)
A <- data.frame(uid = c(1:10),
a = rnorm(10),
b = sample(letters, 10),
c = rbinom(10, 1, .5))
B <- data.frame(uid = c(11:20),
a = rnorm(10),
b = sample(letters, 10),
c = rbinom(10, 1, .5))
C <- data.frame(uid = c(21:30), a = rnorm(10),
b = sample(letters, 10),
c = rbinom(10, 1, .5),
d = rnorm(10))
con <- dbConnect(SQLite(), ":memory:")
copy_to(con, A, "A",
temporary=FALSE)
copy_to(con, B, "B",
temporary=FALSE)
copy_to(con, C, "C",
temporary=FALSE)
# This returns no warning because columns "a" and "b" are in all tables
c("A", "B", "C") %>% db_lselect(con, c("uid", "a", "b"))
# This returns two warnings because column "d" is not in 2 of 3 tables.
# ^ this is by design. It'll inform the user about data availability.
c("A", "B", "C") %>% db_lselect(con, c("uid", "a", "b", "d"))
dbDisconnect(con)