read.apache.access.log {ApacheLogProcessor} | R Documentation |
read.apache.log
Description
Reads the Apache Log Common or Combined Format and return a data frame with the log data.
Usage
read.apache.access.log(file, format = "combined", url_includes = "",
url_excludes = "", columns = c("ip", "datetime", "url", "httpcode",
"size", "referer", "useragent"), num_cores = 1, fields_have_quotes = TRUE)
Arguments
file |
string. Full path to the log file. |
format |
string. Values "common" or "combined" to set the input log format. The default value is the combined. |
url_includes |
regex. If passed only the urls that matches with the regular expression passed will be returned. |
url_excludes |
regex. If passed only the urls that don't matches with the regular expression passed will be returned. |
columns |
list. List of columns names that will be included in data frame output. All columns is the default value. c("ip", "datetime", "url", "httpcode", "size" , "referer", "useragent") |
num_cores |
number. Number of cores for parallel execution, if not passed 1 core is assumed. Used only to convert datetime form string to datetime type. |
fields_have_quotes |
boolean. If passesd as true search and remove the quotes inside the all text fields. |
Details
The functions recives a full path to the log file and process the default log in common or combined format of Apache. LogFormat "%h %l %u %t \"%r\" %>s %b \"%{Referer}i\" \"%{User-Agent}i\"" combined LogFormat "%h %l %u %t \"%r\" %>s %b\" common
Value
a data frame with the apache log file information.
Author(s)
Diogo Silveira Mendonca
See Also
http://httpd.apache.org/docs/1.3/logs.html
Examples
path_combined = system.file("examples", "access_log_combined.txt", package = "ApacheLogProcessor")
path_common = system.file("examples", "access_log_common.txt", package = "ApacheLogProcessor")
#Read a log file with combined format and return it in a data frame
df1 = read.apache.access.log(path_combined)
#Read a log file with common format and return it in a data frame
df2 = read.apache.access.log(path_common, format="common")
#Read only the lines that url matches with the pattern passed
df3 = read.apache.access.log(path_combined, url_includes="infinance")
#Read only the lines that url matches with the pattern passed, but do not matche the exclude pattern
df4 = read.apache.access.log(path_combined,
url_includes="infinance", url_excludes="infinanceclient")
#Return only the ip, url and datetime columns
df5 = read.apache.access.log(path_combined, columns=c("ip", "url", "datetime"))
#Process using 2 cores in parallel for speed up.
df6 = read.apache.access.log(path_combined, num_cores=2)