R: Read in SAS datasets in .sas7bdat format into Spark by using...

spark_read_sas {spark.sas7bdat}

R Documentation

Read in SAS datasets in .sas7bdat format into Spark by using the spark-sas7bdat Spark package.

Description

Read in SAS datasets in .sas7bdat format into Spark by using the spark-sas7bdat Spark package.

Usage

spark_read_sas(sc, path, table)

Arguments

`sc`	Connection to Spark local instance or remote cluster. See the example
`path`	full path to the SAS file either on HDFS (hdfs://), S3 (s3n://), as well as the local file system (file://). Mark that files on the local file system need to be specified using the full path.
`table`	character string with the name of the Spark table where the SAS dataset will be put into

Value

an object of class tbl_spark, which is a reference to a Spark DataFrame based on which dplyr functions can be executed. See https://github.com/sparklyr/sparklyr

References

https://spark-packages.org/package/saurfang/spark-sas7bdat, https://github.com/saurfang/spark-sas7bdat, https://github.com/sparklyr/sparklyr

Examples

## Not run: 
## If you haven't got a Spark cluster, you can install Spark locally like this
library(sparklyr)
spark_install(version = "2.0.1")

## Define the SAS .sas7bdat file, connect to the Spark cluster to read + process the data
myfile <- system.file("extdata", "iris.sas7bdat", package = "spark.sas7bdat")
myfile

library(spark.sas7bdat)
sc <- spark_connect(master = "local")
x <- spark_read_sas(sc, path = myfile, table = "sas_example")
x

library(dplyr)
x %>% group_by(Species) %>%
  summarise(count = n(), length = mean(Sepal_Length), width = mean(Sepal_Width))

## End(Not run)

[Package spark.sas7bdat version 1.4 Index]

Read in SAS datasets in .sas7bdat format into Spark by using the spark-sas7bdat Spark package.

Description

Usage

Arguments

Value

References

See Also

Examples