logFileRead {WebAnalytics}R Documentation

Given a list of file names, read them as log files

Description

This function reads a file, parsing it for the fields specified, and normalises the values that have been read.

The log file is assumed to be space delimited, which is the case for Apache and IIS.

Usage

logFileRead(fileName, 
  columnList=c("MSTimestamp", "clientip", "url", "httpcode", "elapsed"), 
  logTimeZone = "", 
  timeFormat = "") 

Arguments

fileName

The name, including path, of the file to read

columnList

The columns in the file, in order. Columns are:

ApacheTimestamp Optional Apache log format timestamp
MSTimestamp Optional IIS log format timestamp
servername Optional Name of the web server
serverip Optional IP of the server
httpop Optional HTTP verb
url Required Path part of the request
parms Optional Query string
port Optional TCP/IP port that the request arrived on
username Optional User name logged by the web server
userip Optional IP that the request was seen to originate from.
useragent Optional User agent string in the request
httpcode Required HTTP response code
windowscode Optional Windows return code recorded by IIS
windowssubcode Optional Windows sub code recorded by IIS
responsebytes Optional Number of bytes in the HTTP response
requestbytes Optional Number of bytes in the HTTP request
elapsedms Optional Request elapsed time in milliseconds
elapsedus Optional Request elapsed time in microseconds (will be rounded to milliseconds)
elapseds Optional Request elapsed time in seconds (not recommended, will be expanded to milliseconds)
jsessionid Optional User session identifier
ignore* Optional Columns with names starting with 'ignore' are dropped

One timestamp and one elapsed time column name must be specified.

The Apache URL is handled partly in the fix data procedure in the config file because it wraps the operation and URL path in one field. The IIS URL does not need this additional parsing.

logTimeZone

The timezone to use to adjust the timestamps in the log. This is used primarily for IIS logs where the log may be either UTC or local time.

timeFormat

If the timestamp in the log is not in the default for IIS or Apache this can be used to override the timestamp parsing. The format is the r strptime format.

Value

The function returns a dataframe that contains the contents of the file.

Author(s)

Greg Hunt <greg@firmansyah.com>

Examples



logFileName = logFileNamesGetLast(dataDirectory=datd, 
  directoryNames=c(".", "."), 
  fileNamePattern="*[.]log")[[1]]

cols = logFileFieldsGetIIS(logFileName)

logdf = logFileRead(logFileName, columnList=cols, 
            logTimeZone = "", timeFormat = "")

[Package WebAnalytics version 0.9.12 Index]