es_search {uptasticsearch} | R Documentation |
Execute an ES query and get a data.table
Description
Given a query and some optional parameters, es_search
gets results
from HTTP requests to Elasticsearch and returns a data.table
representation of those results.
Usage
es_search(es_host, es_index, size = 10000, query_body = "{}",
scroll = "5m", max_hits = Inf,
n_cores = ceiling(parallel::detectCores()/2),
break_on_duplicates = TRUE, ignore_scroll_restriction = FALSE,
intermediates_dir = getwd())
Arguments
es_host |
A string identifying an Elasticsearch host. This should be of the form
|
es_index |
The name of an Elasticsearch index to be queried. Note that passing
|
size |
Number of records per page of results. See Elasticsearch docs for more.
Note that this will be reset to 0 if you submit a |
query_body |
String with a valid Elasticsearch query. Default is an empty query. |
scroll |
How long should the scroll context be held open? This should be a duration string like "1m" (for one minute) or "15s" (for 15 seconds). The scroll context will be refreshed every time you ask Elasticsearch for another record, so this parameter should just be the amount of time you expect to pass between requests. See the Elasticsearch scroll/pagination docs for more information. |
max_hits |
Integer. If specified, |
n_cores |
Number of cores to distribute fetching and processing over. |
break_on_duplicates |
Boolean, defaults to TRUE. |
ignore_scroll_restriction |
There is a cost associated with keeping an
Elasticsearch scroll context open. By default,
this function does not allow arguments to |
intermediates_dir |
When scrolling over search results, this function writes intermediate results to disk. By default, 'es_search' will create a temporary directory in whatever working directory the function is called from. If you want to change this behavior, provide a path here. 'es_search' will create and write to a temporary directory under whatever path you provide. |
References
Examples
## Not run:
###=== Example 1: Get low-scoring food survey results ===###
query_body <- '{"query":{"filtered":{"filter":{"bool":{"must":[
{"exists":{"field":"customer_comments"}},
{"terms":{"overall_satisfaction":["very low","low"]}}]}}},
"query":{"match_phrase":{"customer_comments":"food"}}}}'
# Execute the query, parse into a data.table
commentDT <- es_search(es_host = 'http://mydb.mycompany.com:9200'
, es_index = "survey_results"
, query_body = query_body
, scroll = "1m"
, n_cores = 4)
###=== Example 2: Time series agg features ===###
# Create query that will give you daily summary stats for revenue
query_body <- '{"query":{"filtered":{"filter":{"bool":{"must":[
{"exists":{"field":"pmt_amount"}}]}}}},
"aggs":{"timestamp":{"date_histogram":{"field":"timestamp","interval":"day"},
"aggs":{"revenue":{"extended_stats":{"field":"pmt_amount"}}}}},"size":0}'
# Execute the query and get the result
resultDT <- es_search(es_host = "http://es.custdb.mycompany.com:9200"
, es_index = 'ticket_sales'
, query_body = query_body)
## End(Not run)