bd_collect {basedosdados}R Documentation

Collects the results of a remote table called via bdplyr()

Description

After bdplyr() is used to create the remote connection, this function allows you to collect the result of the manipulations carried out with the dplyr's verbs and thus use it in local memory completely.

Alternatively, you can also save to disk directly using bd_write() function or its derivatives: bd_write_csv() or bd_write_rds().

Usage

bd_collect(
  .lazy_tbl,
  billing_project_id = basedosdados::get_billing_id(),
  show_query = FALSE
)

Arguments

.lazy_tbl

A variable that contains a database that was previously connected through the bdplyr() function. Tipically, it will be called after performing the desired operations with the {dplyr} verbs.

billing_project_id

a string containing your billing project id. If you've run set_billing_id() then feel free to leave this empty.

show_query

If TRUE will show the SQL query calling dplyr::show_query(). Is useful for diagnosing performance problems.

Value

A tibble.

Examples

## Not run: 

# setup billing
 basedosdados::set_billing_id("billing-project-id")

 # select a cool database at Base dos Dados
bd_table <- basedosdados::bdplyr(
  "basedosdados.br_sp_gov_ssp.ocorrencias_registradas")

# quick look
bd_table %>%
  dplyr::glimpse()

 # filter, select and group the remote data
bd_ssp <-  bd_table %>%
  dplyr::filter(ano >= 2019) %>%
  dplyr::select(ano, mes, homicidio_doloso) %>%
  dplyr::group_by(ano, mes)

 # make some plots
library(ggplot2)

bd_ssp %>%
 # collect the data to continue the analisis
 basedosdados::bd_collect() %>%
  dplyr::summarise(homicidios_sum = sum(homicidio_doloso,
                                         na.rm = TRUE)) %>%
  ggplot(aes(x = mes, y = homicidios_sum, fill = ano)) +
  geom_col(position = "dodge")



## End(Not run)

[Package basedosdados version 0.2.2 Index]