query {tidyquery} | R Documentation |
Query an R data frame with SQL
Description
query
takes a SQL SELECT
statement and uses it to
query an R data frame
Usage
query(data, sql)
Arguments
data |
a data frame or data frame-like object (optional) |
sql |
a character string containing a SQL |
Details
If the data
argument is not specified, then the FROM
clause of the SQL statement determines which data frame to query.
The names of data frames and columns are case-sensitive (like in R). Keywords and function names are not case-sensitive (like in SQL).
In addition to R data frames and tibbles (tbl_df
objects), this
function can query dtplyr_step
objects created by dtplyr, a
data.table backend for dbplyr. It is also possible to use this
function together with dbplyr to query remote database tables
(tbl_sql
objects), but this depends on which database and which
backend package (if any) you are using, so results may vary.
This function is subject to the current limitations of the queryparser package.
Value
An object of the same class as data
.
Examples
library(dplyr)
iris %>% query("SELECT Species, AVG(Petal.Length) GROUP BY Species")
query("SELECT Species, AVG(Petal.Length) FROM iris GROUP BY Species")
iris %>%
filter(Petal.Length > 4) %>%
query("SELECT Species, MAX(Sepal.Length) AS max_sep_len
GROUP BY Species") %>%
arrange(desc(max_sep_len))
library(nycflights13)
query <- "SELECT origin, dest,
COUNT(flight) AS num_flts,
round(AVG(distance)) AS dist,
round(AVG(arr_delay)) AS avg_delay
FROM flights
WHERE distance BETWEEN 200 AND 300
AND air_time IS NOT NULL
GROUP BY origin, dest
HAVING num_flts > 5000
ORDER BY num_flts DESC, avg_delay DESC
LIMIT 100;"
query(query)