split_clf {webreadr}R Documentation

split requests from a CLF-formatted file

Description

CLF (Combined/Common Log Format) files store the HTTP method, protocol and asset requested in the same field. split_clf takes this field as a vector and returns a data.frame containing these elements in distinct columns. The function also works nicely with the uri field from Amazon S3 files (see read_s3).

Usage

split_clf(requests)

Arguments

requests

the "request" field from a CLF-formatted file, read in with read_clf or read_combined.

Value

a data.frame of three columns - "method", "asset" and "protocol" - representing, respectively, the HTTP method used ("GET"), the asset requested ("/favicon.ico") and the protocol used ("HTTP/1.0"). In cases where the request is not intact (containing, for example, just the protocol or just the asset) a row of empty strings will currently be returned. In the future, this will be somewhat improved.

See Also

read_clf and read_combined for reading in these files.

Examples

# Grab CLF data and split out the request.
data <- read_combined(system.file("extdata/combined_log.clf", package = "webreadr"))
requests <- split_clf(data$request)

# An example using S3 files
s3_data <- read_s3(system.file("extdata/s3.log", package = "webreadr"))
s3_requests <- split_clf(s3_data$uri)


[Package webreadr version 0.4.0 Index]