Bdpar {bdpar} | R Documentation |
Class to manage the preprocess of the files throughout the flow of pipes
Description
Bdpar
class provides the static variables required
to perform the whole data flow process. To this end Bdpar
is
in charge of (i) initialize the objects of handle the connections to APIs
(Connections
) and handles json resources (ResourceHandler
)
and (ii) executing the flow of pipes (inherited from GenericPipeline
class)
passed as argument.
Details
In the case that some pipe, defined on the workflow, needs some type of configuration, it can be defined through bdpar.Options variable which have different methods to support the functionality of different pipes.
Static variables
- connections:
-
(Connections) object that handles the connections with YouTube and Twitter.
- resourceHandler:
-
(ResourceHandler) object that handles the json resources files.
Methods
Public methods
Method new()
Creates a Bdpar object. Initializes the static variables: connections and resourceHandler.
Usage
Bdpar$new()
Method execute()
Preprocess files through the indicated flow of pipes.
Usage
Bdpar$execute( path, extractors = ExtractorFactory$new(), pipeline = DefaultPipeline$new(), cache = TRUE, verbose = FALSE, summary = FALSE )
Arguments
path
A
character
value. The path where the files to be processed are located.extractors
A
ExtractorFactory
value. Class which implements thecreateInstance
method to choose which type ofInstance
is created.pipeline
A
GenericPipeline
value. Subclass ofGenericPipeline
, which implements theexecute
method. By default, it is theDefaultPipeline
pipeline.cache
(logical) flag indicating if the status of the instances will be stored after each pipe. This allows to avoid rejections of previously executed tasks, if the order and configuration of the pipe and pipeline is the same as what is stored in the cache.
verbose
(logical) flag indicating for printing messages, warnings and errors.
summary
(logical) flag indicating if a summary of the pipeline execution is provided or not.
Details
In case of wanting to parallelize, it is necessary to indicate the number of cores to be used through bdpar.Options$set("numCores", numCores)
Returns
The list of Instances
that have been preprocessed.
Method clone()
The objects of this class are cloneable with this method.
Usage
Bdpar$clone(deep = FALSE)
Arguments
deep
Whether to make a deep clone.
See Also
bdpar.Options
, Connections
,
DefaultPipeline
, DynamicPipeline
,
GenericPipeline
, Instance
,
ExtractorFactory
, ResourceHandler
,
runPipeline
Examples
## Not run:
#If it is necessary to indicate any configuration, do it through:
#bdpar.Options$set(key, value)
#If the key is not initialized, do it through:
#bdpar.Options$add(key, value)
#If it is necessary parallelize, do it through:
#bdpar.Options$set("numCores", numCores)
#If it is necessary to change the behavior of the log, do it through:
#bdpar.Options$configureLog(console = TRUE, threshold = "INFO", file = NULL)
#Folder with the files to preprocess
path <- system.file("example",
package = "bdpar")
#Object which decides how creates the instances
extractors <- ExtractorFactory$new()
#Object which indicates the pipes' flow
pipeline <- DefaultPipeline$new()
objectBdpar <- Bdpar$new()
#Starting file preprocessing...
objectBdpar$execute(path = path,
extractors = extractors,
pipeline = pipeline,
cache = FALSE,
verbose = FALSE,
summary = TRUE)
## End(Not run)