spark_rcpp_read_warc {sparkwarc} | R Documentation |
Reads a WARC File into using Rcpp
Description
Reads a WARC (Web ARChive) file using Rcpp.
Usage
spark_rcpp_read_warc(path, match_warc, match_line)
Arguments
path |
The path to the file. Needs to be accessible from the cluster. Supports the ‘"hdfs://"’, ‘"s3n://"’ and ‘"file://"’ protocols. |
match_warc |
include only warc files mathcing this character string. |
match_line |
include only lines mathcing this character string. |
[Package sparkwarc version 0.1.6 Index]