R: Query an Azure Cosmos DB container

query_documents {AzureCosmosR}

R Documentation

Query an Azure Cosmos DB container

Description

Query an Azure Cosmos DB container

Usage

query_documents(container, ...)

## S3 method for class 'cosmos_container'
query_documents(
  container,
  query,
  parameters = list(),
  cross_partition = TRUE,
  partition_key = NULL,
  by_pkrange = FALSE,
  as_data_frame = TRUE,
  metadata = TRUE,
  headers = list(),
  ...
)

Arguments

`container`	A Cosmos DB container object, as obtained by `get_cosmos_container` or `create_cosmos_container`.
`query`	A string containing the query text.
`parameters`	A named list of parameters to pass to a parameterised query, if required.
`cross_partition`, `partition_key`, `by_pkrange`	Arguments that control how to handle cross-partition queries. See 'Details' below.
`as_data_frame`	Whether to return the query result as a data frame, or a list of Cosmos DB document objects.
`metadata`	Whether to include Cosmos DB document metadata in the query result.
`headers`, `...`	Optional arguments passed to lower-level functions.

Details

This is the primary function for querying the contents of a Cosmos DB container (table). The query argument should contain the text of a SQL query, optionally parameterised. if the query contains parameters, pass them in the parameters argument as a named list.

Cosmos DB is a partitioned key-value store under the hood, with documents stored in separate physical databases according to their value of the partition key. The Cosmos DB REST API has limited support for cross-partition queries: basic SELECTs should work, but aggregates and more complex queries may require some hand-hacking.

The default cross_partition=TRUE runs the query for all partition key values and then attempts to stitch the results together. To run the query for only one key value, set cross_partition=FALSE and partition_key to the desired value. You can obtain all the values of the key with the list_partition_key_values function.

The by_pkrange argument allows running the query separately across all partition key ranges. Each partition key range corresponds to a separate physical partition, and contains the documents for one or more key values. You can set this to TRUE to run a query that fails when run across partitions; the returned object will be a list containing the individual query results from each pkrange.

As an alternative to AzureCosmosR, you can also use the ODBC protocol to interface with the SQL API. By installing a suitable ODBC driver, you can then talk to Cosmos DB in a manner similar to other SQL databases. An advantage of the ODBC interface is that it fully supports cross-partition queries, unlike the REST API. A disadvantage is that it does not support nested document fields; functions like array_contains() cannot be used, and attempts to reference arrays and objects may return incorrect results.

Value

query_documents returns the results of the query. Most of the time this will be a data frame, or list of data frames if by_pkrange=TRUE.

Examples

## Not run: 

endp <- cosmos_endpoint("https://myaccount.documents.azure.com:443/", key="mykey")

# importing the Star Wars data from dplyr
cont <- endp %>%
    get_cosmos_database(endp, "mydatabase") %>%
    create_cosmos_container(db, "mycontainer", partition_key="sex")

bulk_import(cont, dplyr::starwars)

query_documents(cont, "select * from mycontainer")

# removing the Cosmos DB metadata cruft
query_documents(cont, "select * from mycontainer", metadata=FALSE)

# a simple filter
query_documents(cont, "select * from mycontainer c where c.gender = 'masculine'")

# run query for one partition key -- zero rows returned
query_documents(cont, "select * from mycontainer c where c.gender = 'masculine'",
    partition_key="female")

# aggregates will fail -- API does not fully support cross-partition queries
try(query_documents(cont, "select avg(c.height) avgheight from mycontainer c"))
# Error in process_cosmos_response.response(response, simplify = as_data_frame) :
#  Bad Request (HTTP 400). Failed to complete Cosmos DB operation. Message:
# ...

# run query separately by pkrange and combine the results manually
query_documents(
    cont,
    "select avg(c.height) avgheight, count(1) n from mycontainer c",
    by_pkrange=TRUE
)


## End(Not run)

[Package AzureCosmosR version 1.0.0 Index]

Query an Azure Cosmos DB container

Description

Usage

Arguments

Details

Value

See Also

Examples