query_documents {AzureCosmosR} | R Documentation |
Query an Azure Cosmos DB container
Description
Query an Azure Cosmos DB container
Usage
query_documents(container, ...)
## S3 method for class 'cosmos_container'
query_documents(
container,
query,
parameters = list(),
cross_partition = TRUE,
partition_key = NULL,
by_pkrange = FALSE,
as_data_frame = TRUE,
metadata = TRUE,
headers = list(),
...
)
Arguments
container |
A Cosmos DB container object, as obtained by |
query |
A string containing the query text. |
parameters |
A named list of parameters to pass to a parameterised query, if required. |
cross_partition , partition_key , by_pkrange |
Arguments that control how to handle cross-partition queries. See 'Details' below. |
as_data_frame |
Whether to return the query result as a data frame, or a list of Cosmos DB document objects. |
metadata |
Whether to include Cosmos DB document metadata in the query result. |
headers , ... |
Optional arguments passed to lower-level functions. |
Details
This is the primary function for querying the contents of a Cosmos DB container (table). The query
argument should contain the text of a SQL query, optionally parameterised. if the query contains parameters, pass them in the parameters
argument as a named list.
Cosmos DB is a partitioned key-value store under the hood, with documents stored in separate physical databases according to their value of the partition key. The Cosmos DB REST API has limited support for cross-partition queries: basic SELECTs should work, but aggregates and more complex queries may require some hand-hacking.
The default cross_partition=TRUE
runs the query for all partition key values and then attempts to stitch the results together. To run the query for only one key value, set cross_partition=FALSE
and partition_key
to the desired value. You can obtain all the values of the key with the list_partition_key_values function.
The by_pkrange
argument allows running the query separately across all partition key ranges. Each partition key range corresponds to a separate physical partition, and contains the documents for one or more key values. You can set this to TRUE to run a query that fails when run across partitions; the returned object will be a list containing the individual query results from each pkrange.
As an alternative to AzureCosmosR, you can also use the ODBC protocol to interface with the SQL API. By installing a suitable ODBC driver, you can then talk to Cosmos DB in a manner similar to other SQL databases. An advantage of the ODBC interface is that it fully supports cross-partition queries, unlike the REST API. A disadvantage is that it does not support nested document fields; functions like array_contains()
cannot be used, and attempts to reference arrays and objects may return incorrect results.
Value
query_documents
returns the results of the query. Most of the time this will be a data frame, or list of data frames if by_pkrange=TRUE
.
See Also
cosmos_container, cosmos_document, list_partition_key_values, list_partition_key_ranges
Examples
## Not run:
endp <- cosmos_endpoint("https://myaccount.documents.azure.com:443/", key="mykey")
# importing the Star Wars data from dplyr
cont <- endp %>%
get_cosmos_database(endp, "mydatabase") %>%
create_cosmos_container(db, "mycontainer", partition_key="sex")
bulk_import(cont, dplyr::starwars)
query_documents(cont, "select * from mycontainer")
# removing the Cosmos DB metadata cruft
query_documents(cont, "select * from mycontainer", metadata=FALSE)
# a simple filter
query_documents(cont, "select * from mycontainer c where c.gender = 'masculine'")
# run query for one partition key -- zero rows returned
query_documents(cont, "select * from mycontainer c where c.gender = 'masculine'",
partition_key="female")
# aggregates will fail -- API does not fully support cross-partition queries
try(query_documents(cont, "select avg(c.height) avgheight from mycontainer c"))
# Error in process_cosmos_response.response(response, simplify = as_data_frame) :
# Bad Request (HTTP 400). Failed to complete Cosmos DB operation. Message:
# ...
# run query separately by pkrange and combine the results manually
query_documents(
cont,
"select avg(c.height) avgheight, count(1) n from mycontainer c",
by_pkrange=TRUE
)
## End(Not run)