spark_write_bigquery {sparkbq}R Documentation

Writing data to Google BigQuery

Description

This function writes data to a Google BigQuery table.

Usage

spark_write_bigquery(data,
  billingProjectId = default_billing_project_id(),
  projectId = billingProjectId, datasetId, tableId,
  type = default_bigquery_type(), gcsBucket = default_gcs_bucket(),
  datasetLocation = default_dataset_location(),
  serviceAccountKeyFile = default_service_account_key_file(),
  additionalParameters = NULL, mode = "error", ...)

Arguments

data

Spark DataFrame to write to Google BigQuery.

billingProjectId

Google Cloud Platform project ID for billing purposes. This is the project on whose behalf to perform BigQuery operations. Defaults to default_billing_project_id().

projectId

Google Cloud Platform project ID of BigQuery dataset. Defaults to billingProjectId.

datasetId

Google BigQuery dataset ID (may contain letters, numbers and underscores).

tableId

Google BigQuery table ID (may contain letters, numbers and underscores).

type

BigQuery export type to use. Options include "direct", "parquet", "avro", "orc". Defaults to default_bigquery_type(). See bigquery_defaults for more details about the supported types.

gcsBucket

Google Cloud Storage (GCS) bucket to use for storing temporary files. Temporary files are used when importing through BigQuery load jobs and exporting through BigQuery extraction jobs (i.e. when using data extracts such as Parquet, Avro, ORC, ...). The service account specified in serviceAccountKeyFile needs to be given appropriate rights. This should be the name of an existing storage bucket.

datasetLocation

Geographic location where newly created datasets should reside. "EU" or "US". Defaults to "US". Only needs to be specified if the dataset does not yet exist. It is ignored if it is specified and the dataset already exists.

serviceAccountKeyFile

Google Cloud service account key file to use for authentication with Google Cloud services. The use of service accounts is highly recommended. Specifically, the service account will be used to interact with BigQuery and Google Cloud Storage (GCS).

additionalParameters

Additional spark-bigquery options. See https://github.com/miraisolutions/spark-bigquery for more information.

mode

Specifies the behavior when data or table already exist. One of "overwrite", "append", "ignore" or "error" (default).

...

Additional arguments passed to spark_write_source.

Value

NULL. This is a side-effecting function.

References

https://github.com/miraisolutions/spark-bigquery https://cloud.google.com/bigquery/docs/datasets https://cloud.google.com/bigquery/docs/tables https://cloud.google.com/bigquery/docs/reference/standard-sql/ https://cloud.google.com/bigquery/docs/loading-data-cloud-storage-parquet https://cloud.google.com/bigquery/docs/loading-data-cloud-storage-avro https://cloud.google.com/bigquery/docs/loading-data-cloud-storage-orc https://cloud.google.com/bigquery/pricing https://cloud.google.com/bigquery/docs/dataset-locations https://cloud.google.com/docs/authentication/ https://cloud.google.com/bigquery/docs/authentication/

See Also

spark_write_source, spark_read_bigquery, bigquery_defaults

Other Spark serialization routines: spark_read_bigquery

Examples

## Not run: 
config <- spark_config()

sc <- spark_connect(master = "local", config = config)

bigquery_defaults(
  billingProjectId = "<your_billing_project_id>",
  gcsBucket = "<your_gcs_bucket>",
  datasetLocation = "US",
  serviceAccountKeyFile = "<your_service_account_key_file>",
  type = "direct")

# Copy mtcars to Spark
spark_mtcars <- dplyr::copy_to(sc, mtcars, "spark_mtcars", overwrite = TRUE)

spark_write_bigquery(
  data = spark_mtcars,
  datasetId = "<your_dataset_id>",
  tableId = "mtcars",
  mode = "overwrite")

## End(Not run)

[Package sparkbq version 0.1.1 Index]