parse_path {webtrackR}R Documentation

Parse parts of path for text analysis

Description

parse_path() parses parts of a path, i.e., anything separated by "/", "-", "_" or ".", and adds them as a new variable. Parts that do not consist of letters only, or of a real word, can be filtered via the argument keep.

Usage

parse_path(wt, varname = "url", keep = "letters_only", decode = TRUE)

Arguments

wt

webtrack data object

varname

character. name of the column from which to extract the host. Defaults to "url".

keep

character. Defines which types of path components to keep. If set to "all", anything is kept. If "letters_only", only parts containing letters are kept. If "words_only", only parts constituting English words (as defined by the Word Game Dictionary, cf. https://cran.r-project.org/web/packages/words/index.html) are kept. Support for more languages will be added in future.

decode

logical. Whether to decode the path (see utils::URLdecode()), default to TRUE

Value

webtrack data.frame with the same columns as wt and a new column called 'path_split' (or, if varname not equal to 'url', '<varname>_path_split') containing parts as a comma-separated string.

Examples

## Not run: 
data("testdt_tracking")
wt <- as.wt_dt(testdt_tracking)
wt <- parse_path(wt)

## End(Not run)

[Package webtrackR version 0.3.1 Index]