read_clf {webreadr} | R Documentation |
read CLF-formatted logs
Description
Read a file of request logs stored in the Common Log Format.
Usage
read_clf(file, has_header = FALSE)
Arguments
file |
the full path to the CLF-formatted file you want to read. |
has_header |
whether or not the file has a header row. Set to FALSE by default. |
Details
the CLF is a standardised format for web request logs. It consists of the fields:
ip_address: the IP address of the remote host that made the request. The CLF does not (by default) include the de-facto standard X-Forwarded-For header
remote_user_ident: the RFC 1413 remote user identifier.
local_user_ident: the identifier the user has authenticated with locally.
timestamp: the timestamp associated with the request, stored as "[08/Apr/2001:17:39:04 -0800]", where "-0800" represents the time offset (minus eight hours) of the timestamp from UTC.
request: the actual user request, containing the HTTP method used, the asset requested, and the HTTP Protocol version used.
status_code: the HTTP status code returned.
bytes_sent: the number of bytes sent
While outdated as a standard, systems using the CLF are still around; the Squid caching
system, for example, uses the CLF as one of its default log formats (the other,
the squid "native" format, can be read with read_squid
).
Value
a data.frame consisting of seven fields, as discussed above, with normalised timestamps.
See Also
read_combined
for the /Combined/ Log Format, and
split_clf
for splitting out the "requests" field.
Examples
#Read in an example CLF-formatted file provided with the webreadr package.
data <- read_clf(system.file("extdata/log.clf", package = "webreadr"))