parse_android {WhatsR} | R Documentation |
Parsing raw 'WhatsApp' chat logs according to Android text structure
Description
Creates a data frame from an exported 'WhatsApp' chat log containing one row per message
and a column for DateTime when the message was sent, name of the sender and body of the message. Only works as an intermediary function
called from within parse_chat
Usage
parse_android(
chatlog,
newline_indicator = "\n",
media_omitted = "<media omitted>",
media_indicator = "(file attached)",
sent_location = paste0("location: (?=https:\\/\\/maps\\.google\\.com\\/",
"\\?q=\\d\\d.\\d{6}\\,\\d\\.\\d{6})"),
live_location = "^live location shared$",
datetime_indicator = paste("(?!^)(?=((\\d{2}\\.\\d{2}\\.\\d{2})|(\\d{1,2}",
"\\/\\d{1,2}\\/\\d{2})),\\s\\d{2}\\:\\d{2}((\\s\\-)|(\\s(?i:(am|pm))\\s\\-)))",
sep = ""),
newline_replace = " start_newline ",
media_replace = " media_omitted ",
foursquare_loc = "^.*: https://foursquare.com/v/.*$"
)
Arguments
chatlog |
'WhatsApp' chat preprocessed by |
newline_indicator |
character string defining character for newline indicators. Default is a Unicode newline. |
media_omitted |
character string inserted by 'WhatsApp' instead of file names when not exporting media. |
media_indicator |
character string for detecting media and file attachments. |
sent_location |
Regex for detecting auto generated messages for locations shared via chat. |
live_location |
Regex for detecting auto generated messages for live locations shared via chat. |
datetime_indicator |
Regex for detecting the DateTime indicator at the beginning of each message. |
newline_replace |
replacement string for a newline character in parsed message. Default is " start_newline ". |
media_replace |
replacement string for omitted media files. Default is " media_omitted ". |
foursquare_loc |
Regex for detecting sent Locations as FourSquare Links. |
Value
A data frame containing the timestamp, name of the sender and message body
Examples
ParsedChat <- parse_android("29.01.18, 23:33 - Alice: Hi?\n 29.01.18, 23:45 - Bob: Hi\n")