split_clf {webreadr} | R Documentation |
split requests from a CLF-formatted file
Description
CLF (Combined/Common Log Format) files store the HTTP method, protocol
and asset requested in the same field. split_clf
takes this field as a vector
and returns a data.frame containing these elements in distinct columns. The function
also works nicely with the uri
field from Amazon S3 files (see
read_s3
).
Usage
split_clf(requests)
Arguments
requests |
the "request" field from a CLF-formatted file, read in with
|
Value
a data.frame of three columns - "method", "asset" and "protocol" - representing, respectively, the HTTP method used ("GET"), the asset requested ("/favicon.ico") and the protocol used ("HTTP/1.0"). In cases where the request is not intact (containing, for example, just the protocol or just the asset) a row of empty strings will currently be returned. In the future, this will be somewhat improved.
See Also
read_clf
and read_combined
for reading
in these files.
Examples
# Grab CLF data and split out the request.
data <- read_combined(system.file("extdata/combined_log.clf", package = "webreadr"))
requests <- split_clf(data$request)
# An example using S3 files
s3_data <- read_s3(system.file("extdata/s3.log", package = "webreadr"))
s3_requests <- split_clf(s3_data$uri)