StopWordPipe {bdpar} | R Documentation |
Class to find and/or remove the stop words on the data field of an Instance
Description
StopWordPipe
class is responsible for detecting
the existing stop words in the data field of each Instance
.
Identified stop words are stored inside the contraction field of
Instance
class. Moreover if needed, is able to perform inline
stop words removement.
Details
StopWordPipe
class requires the resource files (in json format)
containing the list of stop words. To this end, the language of the text
indicated in the propertyLanguageName should be contained in the
resource file name (ie. xxx.json where xxx is the value defined in the
propertyLanguageName ). The location of the resources should be
defined in the "resources.stopwords.path" field of
bdpar.Options variable.
Note
StopWordPipe
will automatically invalidate the
Instance
whenever the obtained data is empty.
Inherit
This class inherits from GenericPipe
and implements the
pipe
abstract function.
Super class
bdpar::GenericPipe
-> StopWordPipe
Methods
Public methods
Inherited methods
Method new()
Creates a StopWordPipe
object.
Usage
StopWordPipe$new( propertyName = "stopWord", propertyLanguageName = "language", alwaysBeforeDeps = list("GuessLanguagePipe"), notAfterDeps = list("AbbreviationPipe"), removeStopWords = TRUE, resourcesStopWordsPath = NULL )
Arguments
propertyName
A
character
value. Name of the property associated with theGenericPipe
.propertyLanguageName
A
character
value. Name of the language property.alwaysBeforeDeps
A
list
value. The dependencies alwaysBefore (GenericPipes
that must be executed before this one).notAfterDeps
A
list
value. The dependencies notAfter (GenericPipes
that cannot be executed after this one).removeStopWords
A
logical
value. Indicates if the stop words are removed or not.resourcesStopWordsPath
A
character
value. Path of resource files (in json format) containing the stop words.
Method pipe()
Preprocesses the Instance
to obtain/remove
the stop words. The stop words found in the data are added to the
list of properties of the Instance
.
Usage
StopWordPipe$pipe(instance)
Arguments
Returns
The Instance
with the modifications that have
occurred in the pipe.
Method findStopWord()
Checks if the stop word is in the data.
Usage
StopWordPipe$findStopWord(data, stopWord)
Arguments
Returns
A logical
value depending on whether the
stop word is in the data.
Method removeStopWord()
Removes the stop word in the data.
Usage
StopWordPipe$removeStopWord(stopWord, data)
Arguments
Returns
The data with the stop words removed.
Method getPropertyLanguageName()
Gets the name of property language.
Usage
StopWordPipe$getPropertyLanguageName()
Returns
Value of name of property language.
Method getResourcesStopWordsPath()
Gets the path of stop words resources.
Usage
StopWordPipe$getResourcesStopWordsPath()
Returns
Value of path of stop words resources.
Method setResourcesStopWordsPath()
Sets the path of stop words resources.
Usage
StopWordPipe$setResourcesStopWordsPath(path)
Arguments
path
A
character
value. The new value of the path of stop words resources.
Method clone()
The objects of this class are cloneable with this method.
Usage
StopWordPipe$clone(deep = FALSE)
Arguments
deep
Whether to make a deep clone.
See Also
AbbreviationPipe
, bdpar.Options
,
ContractionPipe
, File2Pipe
,
FindEmojiPipe
, FindEmoticonPipe
,
FindHashtagPipe
, FindUrlPipe
,
FindUserNamePipe
, GuessDatePipe
,
GuessLanguagePipe
, Instance
,
InterjectionPipe
, MeasureLengthPipe
,
GenericPipe
, ResourceHandler
,
SlangPipe
, StoreFileExtPipe
,
TargetAssigningPipe
, TeeCSVPipe
,
ToLowerCasePipe