spark_rcpp_read_warc {sparkwarc}R Documentation

Reads a WARC File into using Rcpp

Description

Reads a WARC (Web ARChive) file using Rcpp.

Usage

spark_rcpp_read_warc(path, match_warc, match_line)

Arguments

path

The path to the file. Needs to be accessible from the cluster. Supports the ‘⁠"hdfs://"⁠’, ‘⁠"s3n://"⁠’ and ‘⁠"file://"⁠’ protocols.

match_warc

include only warc files mathcing this character string.

match_line

include only lines mathcing this character string.


[Package sparkwarc version 0.1.6 Index]