read.apache.access.log {ApacheLogProcessor}R Documentation

read.apache.log

Description

Reads the Apache Log Common or Combined Format and return a data frame with the log data.

Usage

read.apache.access.log(file, format = "combined", url_includes = "",
  url_excludes = "", columns = c("ip", "datetime", "url", "httpcode",
  "size", "referer", "useragent"), num_cores = 1, fields_have_quotes = TRUE)

Arguments

file

string. Full path to the log file.

format

string. Values "common" or "combined" to set the input log format. The default value is the combined.

url_includes

regex. If passed only the urls that matches with the regular expression passed will be returned.

url_excludes

regex. If passed only the urls that don't matches with the regular expression passed will be returned.

columns

list. List of columns names that will be included in data frame output. All columns is the default value. c("ip", "datetime", "url", "httpcode", "size" , "referer", "useragent")

num_cores

number. Number of cores for parallel execution, if not passed 1 core is assumed. Used only to convert datetime form string to datetime type.

fields_have_quotes

boolean. If passesd as true search and remove the quotes inside the all text fields.

Details

The functions recives a full path to the log file and process the default log in common or combined format of Apache. LogFormat "%h %l %u %t \"%r\" %>s %b \"%{Referer}i\" \"%{User-Agent}i\"" combined LogFormat "%h %l %u %t \"%r\" %>s %b\" common

Value

a data frame with the apache log file information.

Author(s)

Diogo Silveira Mendonca

See Also

http://httpd.apache.org/docs/1.3/logs.html

Examples

path_combined = system.file("examples", "access_log_combined.txt", package = "ApacheLogProcessor")
path_common = system.file("examples", "access_log_common.txt", package = "ApacheLogProcessor")

#Read a log file with combined format and return it in a data frame
df1 = read.apache.access.log(path_combined)

#Read a log file with common format and return it in a data frame
df2 = read.apache.access.log(path_common, format="common") 

#Read only the lines that url matches with the pattern passed
df3 = read.apache.access.log(path_combined, url_includes="infinance")

#Read only the lines that url matches with the pattern passed, but do not matche the exclude pattern
df4 = read.apache.access.log(path_combined, 
url_includes="infinance", url_excludes="infinanceclient")

#Return only the ip, url and datetime columns
df5 = read.apache.access.log(path_combined, columns=c("ip", "url", "datetime"))

#Process using 2 cores in parallel for speed up. 
df6 = read.apache.access.log(path_combined, num_cores=2)



[Package ApacheLogProcessor version 0.2.3 Index]