FindUrlPipe {bdpar}R Documentation

Class to find and/or remove the URLs on the data field of an Instance

Description

This class is responsible of detecting the existing URLs in the data field of each Instance. Identified URLs are stored inside the URLs field of Instance class. Moreover if required, is able to perform inline URLs removement.

Details

The regular expressions indicated in the URLPatterns variable are used to identify URLs.

Note

FindUrlPipe will automatically invalidate the Instance whenever the obtained data is empty.

Inherit

This class inherits from GenericPipe and implements the pipe abstract function.

Super class

bdpar::GenericPipe -> FindUrlPipe

Public fields

URLPattern

A character value. The regular expression to detect URLs.

EmailPattern

A character value. The regular expression to detect emails.

Methods

Public methods

Inherited methods

Method new()

Creates a FindUrlPipe object.

Usage
FindUrlPipe$new(
  propertyName = "URLs",
  alwaysBeforeDeps = list(),
  notAfterDeps = list("FindUrlPipe"),
  removeUrls = TRUE,
  URLPatterns = list(self$URLPattern, self$EmailPattern),
  namesURLPatterns = list("UrlPattern", "EmailPattern")
)
Arguments
propertyName

A character value. Name of the property associated with the GenericPipe.

alwaysBeforeDeps

A list value. The dependencies alwaysBefore (GenericPipes that must be executed before this one).

notAfterDeps

A list value. The dependencies notAfter (GenericPipes that cannot be executed after this one).

removeUrls

A logical value. Indicates if the URLs are removed.

URLPatterns

A list value. The regex to find URLs.

namesURLPatterns

A list value. The names of regex.

propertyLanguageName

A character value. Name of the language property.


Method pipe()

Preprocesses the Instance to obtain/remove the URLs. The URLs found in the data are added to the list of properties of the Instance.

Usage
FindUrlPipe$pipe(instance)
Arguments
instance

A Instance value. The Instance to preprocess.

Returns

The Instance with the modifications that have occurred in the pipe.


Method findUrl()

Finds the URLs in the data.

Usage
FindUrlPipe$findUrl(pattern, data)
Arguments
pattern

A character value. The regex to find URLs.

data

A character value. The text to find the URLs.

Returns

The list with URLs found.


Method removeUrl()

Removes the URL in the data.

Usage
FindUrlPipe$removeUrl(pattern, data)
Arguments
pattern

A character value. The regex to find URLs.

data

A character value. The text to remove the URLs.

Returns

The data with URLs removed.


Method putNamesURLPattern()

Sets the names to URL patterns result.

Usage
FindUrlPipe$putNamesURLPattern(resultOfURLPatterns)
Arguments
resultOfURLPatterns

A list value. The list with URLs found.

Returns

The URLs found with the names of URL pattern.


Method getURLPatterns()

Gets the URL patterns.

Usage
FindUrlPipe$getURLPatterns()
Returns

Value of URL patterns.


Method setURLPatterns()

Sets the URL patterns.

Usage
FindUrlPipe$setURLPatterns(URLPatterns)
Arguments
URLPatterns

A list value. The new value of the URL patterns.


Method getNamesURLPatterns()

Gets the names of URLs.

Usage
FindUrlPipe$getNamesURLPatterns()
Returns

Value of names of URLs.


Method setNamesURLPatterns()

Sets the names of URLs.

Usage
FindUrlPipe$setNamesURLPatterns(namesURLPatterns)
Arguments
namesURLPatterns

A list value. The new value of the names of URLs.


Method clone()

The objects of this class are cloneable with this method.

Usage
FindUrlPipe$clone(deep = FALSE)
Arguments
deep

Whether to make a deep clone.

See Also

AbbreviationPipe, ContractionPipe, File2Pipe, FindEmojiPipe, FindEmoticonPipe, FindHashtagPipe, FindUserNamePipe, GuessDatePipe, GuessLanguagePipe, Instance, InterjectionPipe, MeasureLengthPipe, GenericPipe, SlangPipe, StopWordPipe, StoreFileExtPipe, TargetAssigningPipe, TeeCSVPipe, ToLowerCasePipe


[Package bdpar version 3.1.0 Index]