bdplyr {basedosdados} | R Documentation |
Compatibility with dplyr verbs without using SQL language
Description
Allow you to explore and perform operation with Base dos Dados' datasets
without using SQL language. The bdplyr()
function creates lazy
variables
that will be connected directly to the desired table from Base dos Dados at
Google BigQuery and can be handled with the dplyr::dplyr-package's verbs
as traditionally done as local bases. See also: bigrquery::src_bigquery.
Therefore, it is possible (without using SQL
) to perform, for example,
column selection with dplyr::select()
, filter rows with dplyr::filter()
,
operations with dplyr::mutate()
, joins with dplyr::left_join()
and
other vebs from {dplyr}
package.
The data will be automatically be downloaded from Google BigQuery in the background as it if necessary, but wille not be loaded into your virtual memory nor recorded on disk unless expressly requested.
For this, the functions such as bd_collect()
or bd_write()
should be
used. To load the data handled locally in your virtual memory, use
bd_collect()
. To save the result in disk use the broader function
bd_write()
or its derivatives bd_write_csv()
or bd_write_rds()
to
save, respectively in .csv
or .rds
format.
Usage
bdplyr(
table,
billing_project_id = basedosdados::get_billing_id(),
query_project_id = "basedosdados"
)
Arguments
table |
String in the format |
billing_project_id |
a string containing your billing project id.
If you've run |
query_project_id |
The project name at GoogleBigQuery. By default
|
Value
A lazy tibble
, which can be handled (almost) as if were a local
database. After satisfactorily handled, the result must be loaded into
memory using bd_collect()
or written to disk using bd_write()
or its
derivatives.
See Also
bd_collect()
, bd_write()
, bd_write_rds()
, bd_write_rds()
,
bigrquery::src_bigquery
Examples
## Not run:
# set project billing id
basedosdados::set_billing_id("avalidprojectbillingid")
# connects to the remote table I want
base_sim <- bdplyr("br_ms_sim.municipio_causa_idade")
# connects to another remote table
municipios <- bdplyr("br_bd_diretorios_brasil.municipio")
# explore data
base_sim %>%
dplyr::glimpse()
# use normal `{dplyr}` operations
municipios %>%
head()
# filter
base_sim_acre <- base_sim %>%
dplyr::mutate(ano = as.numeric(ano)) %>%
dplyr::filter(sigla_uf == "AC", ano >= 2018)
municipios_acre <- municipios %>%
dplyr::filter(sigla_uf == "AC") %>%
dplyr::select(id_municipio, municipio, regiao)
# join
base_junta <- base_sim_acre %>%
dplyr::left_join(municipios_acre,
by = "id_municipio")
# tests whether the result is satisfactory
base_junta
# collect the result
base_final <- base_junta %>%
basedosdados::bd_collect()
# alternatively, write in disk the result
base_final %>%
basedosdados::bd_write_rds(path = "data-raw/data.rds")
## End(Not run)