R: Compatibility with dplyr verbs without using SQL language

bdplyr {basedosdados}

R Documentation

Compatibility with dplyr verbs without using SQL language

Description

Allow you to explore and perform operation with Base dos Dados' datasets without using SQL language. The bdplyr() function creates lazy variables that will be connected directly to the desired table from Base dos Dados at Google BigQuery and can be handled with the dplyr::dplyr-package's verbs as traditionally done as local bases. See also: bigrquery::src_bigquery.

Therefore, it is possible (without using SQL) to perform, for example, column selection with dplyr::select(), filter rows with dplyr::filter(), operations with dplyr::mutate(), joins with dplyr::left_join() and other vebs from {dplyr} package.

The data will be automatically be downloaded from Google BigQuery in the background as it if necessary, but wille not be loaded into your virtual memory nor recorded on disk unless expressly requested.

For this, the functions such as bd_collect() or bd_write() should be used. To load the data handled locally in your virtual memory, use bd_collect(). To save the result in disk use the broader function bd_write() or its derivatives bd_write_csv() or bd_write_rds() to save, respectively in .csv or .rds format.

Usage

bdplyr(
  table,
  billing_project_id = basedosdados::get_billing_id(),
  query_project_id = "basedosdados"
)

Arguments

`table`	String in the format `(dataset_name)`.`(table_name)`. You can optionally input a project before the dataset name.
`billing_project_id`	a string containing your billing project id. If you've run `set_billing_id()` then feel free to leave this empty.
`query_project_id`	The project name at GoogleBigQuery. By default `basedosdados`. You do not need to inform this if project is uset on `table` parameter.

Value

A ⁠lazy tibble⁠, which can be handled (almost) as if were a local database. After satisfactorily handled, the result must be loaded into memory using bd_collect() or written to disk using bd_write() or its derivatives.

Examples


## Not run: 

# set project billing id
basedosdados::set_billing_id("avalidprojectbillingid")

# connects to the remote table I want
base_sim <- bdplyr("br_ms_sim.municipio_causa_idade")

# connects to another remote table
municipios <- bdplyr("br_bd_diretorios_brasil.municipio")

# explore data
base_sim %>%
  dplyr::glimpse()

# use normal `{dplyr}` operations
municipios %>%
  head()

# filter
base_sim_acre <- base_sim %>%
 dplyr::mutate(ano = as.numeric(ano)) %>%
  dplyr::filter(sigla_uf == "AC", ano >= 2018)

municipios_acre <- municipios %>%
  dplyr::filter(sigla_uf == "AC") %>%
  dplyr::select(id_municipio, municipio, regiao)


# join
base_junta <- base_sim_acre %>%
  dplyr::left_join(municipios_acre,
                   by = "id_municipio")

# tests whether the result is satisfactory
base_junta

# collect the result
base_final <- base_junta %>%
  basedosdados::bd_collect()

# alternatively, write in disk the result

base_final %>%
  basedosdados::bd_write_rds(path = "data-raw/data.rds")


## End(Not run)