spark_read_sas {spark.sas7bdat}R Documentation

Read in SAS datasets in .sas7bdat format into Spark by using the spark-sas7bdat Spark package.

Description

Read in SAS datasets in .sas7bdat format into Spark by using the spark-sas7bdat Spark package.

Usage

spark_read_sas(sc, path, table)

Arguments

sc

Connection to Spark local instance or remote cluster. See the example

path

full path to the SAS file either on HDFS (hdfs://), S3 (s3n://), as well as the local file system (file://). Mark that files on the local file system need to be specified using the full path.

table

character string with the name of the Spark table where the SAS dataset will be put into

Value

an object of class tbl_spark, which is a reference to a Spark DataFrame based on which dplyr functions can be executed. See https://github.com/sparklyr/sparklyr

References

https://spark-packages.org/package/saurfang/spark-sas7bdat, https://github.com/saurfang/spark-sas7bdat, https://github.com/sparklyr/sparklyr

See Also

spark_connect, sdf_register

Examples

## Not run: 
## If you haven't got a Spark cluster, you can install Spark locally like this
library(sparklyr)
spark_install(version = "2.0.1")

## Define the SAS .sas7bdat file, connect to the Spark cluster to read + process the data
myfile <- system.file("extdata", "iris.sas7bdat", package = "spark.sas7bdat")
myfile

library(spark.sas7bdat)
sc <- spark_connect(master = "local")
x <- spark_read_sas(sc, path = myfile, table = "sas_example")
x

library(dplyr)
x %>% group_by(Species) %>%
  summarise(count = n(), length = mean(Sepal_Length), width = mean(Sepal_Width))

## End(Not run)

[Package spark.sas7bdat version 1.4 Index]