| StopWordPipe {bdpar} | R Documentation |
Class to find and/or remove the stop words on the data field of an Instance
Description
StopWordPipe class is responsible for detecting
the existing stop words in the data field of each Instance.
Identified stop words are stored inside the contraction field of
Instance class. Moreover if needed, is able to perform inline
stop words removement.
Details
StopWordPipe class requires the resource files (in json format)
containing the list of stop words. To this end, the language of the text
indicated in the propertyLanguageName should be contained in the
resource file name (ie. xxx.json where xxx is the value defined in the
propertyLanguageName ). The location of the resources should be
defined in the "resources.stopwords.path" field of
bdpar.Options variable.
Note
StopWordPipe will automatically invalidate the
Instance whenever the obtained data is empty.
Inherit
This class inherits from GenericPipe and implements the
pipe abstract function.
Super class
bdpar::GenericPipe -> StopWordPipe
Methods
Public methods
Inherited methods
Method new()
Creates a StopWordPipe object.
Usage
StopWordPipe$new(
propertyName = "stopWord",
propertyLanguageName = "language",
alwaysBeforeDeps = list("GuessLanguagePipe"),
notAfterDeps = list("AbbreviationPipe"),
removeStopWords = TRUE,
resourcesStopWordsPath = NULL
)Arguments
propertyNameA
charactervalue. Name of the property associated with theGenericPipe.propertyLanguageNameA
charactervalue. Name of the language property.alwaysBeforeDepsA
listvalue. The dependencies alwaysBefore (GenericPipesthat must be executed before this one).notAfterDepsA
listvalue. The dependencies notAfter (GenericPipesthat cannot be executed after this one).removeStopWordsA
logicalvalue. Indicates if the stop words are removed or not.resourcesStopWordsPathA
charactervalue. Path of resource files (in json format) containing the stop words.
Method pipe()
Preprocesses the Instance to obtain/remove
the stop words. The stop words found in the data are added to the
list of properties of the Instance.
Usage
StopWordPipe$pipe(instance)
Arguments
Returns
The Instance with the modifications that have
occurred in the pipe.
Method findStopWord()
Checks if the stop word is in the data.
Usage
StopWordPipe$findStopWord(data, stopWord)
Arguments
Returns
A logical value depending on whether the
stop word is in the data.
Method removeStopWord()
Removes the stop word in the data.
Usage
StopWordPipe$removeStopWord(stopWord, data)
Arguments
Returns
The data with the stop words removed.
Method getPropertyLanguageName()
Gets the name of property language.
Usage
StopWordPipe$getPropertyLanguageName()
Returns
Value of name of property language.
Method getResourcesStopWordsPath()
Gets the path of stop words resources.
Usage
StopWordPipe$getResourcesStopWordsPath()
Returns
Value of path of stop words resources.
Method setResourcesStopWordsPath()
Sets the path of stop words resources.
Usage
StopWordPipe$setResourcesStopWordsPath(path)
Arguments
pathA
charactervalue. The new value of the path of stop words resources.
Method clone()
The objects of this class are cloneable with this method.
Usage
StopWordPipe$clone(deep = FALSE)
Arguments
deepWhether to make a deep clone.
See Also
AbbreviationPipe, bdpar.Options,
ContractionPipe, File2Pipe,
FindEmojiPipe, FindEmoticonPipe,
FindHashtagPipe, FindUrlPipe,
FindUserNamePipe, GuessDatePipe,
GuessLanguagePipe, Instance,
InterjectionPipe, MeasureLengthPipe,
GenericPipe, ResourceHandler,
SlangPipe, StoreFileExtPipe,
TargetAssigningPipe, TeeCSVPipe,
ToLowerCasePipe