| Scheduler {sched} | R Documentation |
Class for scheduling web requests.
Description
Class for scheduling web requests.
Class for scheduling web requests.
Details
The Scheduler class controls the frequency of access to web sites, through
the definiton of access rules (Rule class).
It handles GET and POST requests, as well as file downloading.
It can use a cache system to store request results and avoid resending
identical requests.
Methods
Public methods
Method new()
New instance initializer.
There should be only one Scheduler instance in an application. There is no sense in having two or more instances, since they will ignore each other and break the access frequency rules when they contact the same sites.
Usage
Scheduler$new(
default_rule = Rule$new(),
ssl_verifypeer = TRUE,
nb_max_tries = 10L,
cache_dir = tools::R_user_dir("sched", which = "cache"),
user_agent = NULL,
dwnld_timeout = 3600
)Arguments
default_ruleThe default_rule to use when none has been defined for a site.
ssl_verifypeerIf set to TRUE (default), SSL certificate will be checked, otherwise certificates will be ignored.
nb_max_triesMaximum number of tries when running a request.
cache_dirSet the path to the file system cache. Set to NULL to disable the cache system. The cache system will save downloaded content and reuse it later for identical requests.
user_agentThe application name and contact address to send to the contacted web server.
dwnld_timeoutThe timeout used by
downloadFile()method, in seconds.
Returns
Nothing.
Examples
# Create a scheduler instance with a custom default_rule
scheduler <- sched::Scheduler$new(default_rule=sched::Rule$new(10, 1),
cache_dir = NULL)
Method setRule()
Defines a rule for a site.
Defines a rule for a site. The site is identified by its hostname. Each time a request will be made to this host (i.e.: the URL contains the defined hostname), the scheduling rule will be applied in order to wait (sleep) if nedeed before sending the request.
If a rule already exists for this hostname, it will be replaced.
Usage
Scheduler$setRule(host, n = 3L, lap = 1)
Arguments
hostThe hostname of the site.
nNumber of events during a time lap.
lapDuration of a time lap, in seconds.
Returns
Nothing.
Examples
# Create a scheduler instance
scheduler <- sched::Scheduler$new(cache_dir = NULL)
# Define a rule with default values
scheduler$setRule('www.ebi.ac.uk')
# Define a rule with custome values
scheduler$setRule('my.other.site', n=10, lap=3)
Method sendRequest()
Sends a request, and retrieves content result.
Usage
Scheduler$sendRequest(request, cache_read = TRUE)
Arguments
requestA
sched::Requestinstance.cache_readIf set to TRUE and the cache system is enabled, the cache system will be searched for the request and the cached result returned. In any case, if the the cache system is enabled, and the request sent, the retrieved content will be stored into the cache.
Returns
The results returned by the contacted server, as a single string value.
Examples
# Create a scheduler instance
scheduler <- sched::Scheduler$new(cache_dir = NULL)
# Define a scheduling rule of 7 requests every 2 seconds
scheduler$setRule('www.ebi.ac.uk', n=7, lap=2)
# Create a request object
u <- 'https://www.ebi.ac.uk/webservices/chebi/2.0/test/getCompleteEntity'
url <- sched::URL$new(url=u, params=c(chebiId=15440))
request <- sched::Request$new(url)
# Send the request and get the content result
content <- scheduler$sendRequest(request)
Method downloadFile()
Downloads the content of a URL and save it into the specified destination file.
This method works for any URL, even if it has been written with heavy
files in mind.
Since it uses utils::download.file() which saves the content
directly on disk, the cache system is not used.
Usage
Scheduler$downloadFile(url, dest_file, quiet = FALSE, timeout = NULL)
Arguments
urlThe URL to access, as a sched::URL object.
dest_fileA path to a destination file.
quietThe quiet parameter for
utils::download.file().timeoutThe timeout in seconds. Defaults to value provided in initializer.
Returns
Nothing.
Examples
# Create a scheduler instance
scheduler <- sched::Scheduler$new(cache_dir = NULL)
# Create a temporary directory
tmp_dir <- tempdir()
# Download a file
u <- sched::URL$new(
'https://gitlab.com/cnrgh/databases/r-sched/-/raw/main/README.md',
c(ref_type='heads'))
scheduler$downloadFile(u, file.path(tmp_dir, 'README.md'))
# Remove the temporary directory
unlink(tmp_dir, recursive = TRUE)
Method getUrlString()
Builds a URL string, using a base URL and parameters to be passed.
The provided base URL and parameters are combined into a full URL string.
DEPRECATED. Use the sched::URL class and its method
toString() instead.
Usage
Scheduler$getUrlString(url, params = list())
Arguments
urlA URL string.
paramsA list of URL parameters.
Returns
The full URL string as a single character value.
Examples
# Create a scheduler instance scheduler <- sched::Scheduler$new(cache_dir = NULL) # Create a URL string url.str <- scheduler$getUrlString( 'https://www.ebi.ac.uk/webservices/chebi/2.0/test/getCompleteEntity', params=c(chebiId=15440))
Method getUrl()
Sends a request and get the result.
DEPRECATED. Use method sendRequest() instead.
Usage
Scheduler$getUrl(
url,
params = list(),
method = c("get", "post"),
header = NULL,
body = NULL,
encoding = NULL
)Arguments
urlA URL string.
paramsA list of URL parameters.
methodThe method to use. Either 'get' or 'post'.
headerThe header to send.
bodyThe body to send.
encodingThe encoding to use.
Returns
The results of the request.
Examples
# Create a scheduler instance scheduler <- sched::Scheduler$new(cache_dir = NULL) # Send request content <- scheduler$getUrl( 'https://www.ebi.ac.uk/webservices/chebi/2.0/test/getCompleteEntity', params=c(chebiId=15440))
Method deleteRules()
Removes all defined rules, including the ones automatically defined using default_rule.
Usage
Scheduler$deleteRules()
Returns
Nothing.
Examples
# Create a scheduler instance
scheduler <- sched::Scheduler$new(cache_dir = NULL)
# Define a rule with custome values
scheduler$setRule('my.other.site', n=10, lap=3)
# Delete all defined rules
scheduler$deleteRules()
Method getNbRules()
Gets the number of defined rules, including the ones automatically defined using default_rule.
Usage
Scheduler$getNbRules()
Returns
The number of rules defined.
Examples
# Create a scheduler instance scheduler <- sched::Scheduler$new(cache_dir = NULL) # Get the number of defined rules print(scheduler$getNbRules())
Method setOffline()
Enables or disables offline mode.
If the offline mode is enabled, an error will be raised when the class attemps to send a request. This mode is mainly useful when debugging the usage of the cache system.
Usage
Scheduler$setOffline(offline)
Arguments
offlineSet to TRUE to enable offline mode, and FALSE otherwise.
Returns
Nothing.
Examples
# Create a scheduler instance scheduler <- sched::Scheduler$new(cache_dir = NULL) # Enable offline mode scheduler$setOffline(TRUE)
Method clone()
The objects of this class are cloneable with this method.
Usage
Scheduler$clone(deep = FALSE)
Arguments
deepWhether to make a deep clone.
Examples
# Create a scheduler instance without cache
scheduler <- sched::Scheduler$new(cache_dir = NULL)
# Define a rule with default values
scheduler$setRule('www.ebi.ac.uk')
# Create a request object
u <- 'https://www.ebi.ac.uk/webservices/chebi/2.0/test/getCompleteEntity'
url <- sched::URL$new(url=u, params=c(chebiId=15440))
request <- sched::Request$new(url)
# Send the request and get the content result
content <- scheduler$sendRequest(request)
## ------------------------------------------------
## Method `Scheduler$new`
## ------------------------------------------------
# Create a scheduler instance with a custom default_rule
scheduler <- sched::Scheduler$new(default_rule=sched::Rule$new(10, 1),
cache_dir = NULL)
## ------------------------------------------------
## Method `Scheduler$setRule`
## ------------------------------------------------
# Create a scheduler instance
scheduler <- sched::Scheduler$new(cache_dir = NULL)
# Define a rule with default values
scheduler$setRule('www.ebi.ac.uk')
# Define a rule with custome values
scheduler$setRule('my.other.site', n=10, lap=3)
## ------------------------------------------------
## Method `Scheduler$sendRequest`
## ------------------------------------------------
# Create a scheduler instance
scheduler <- sched::Scheduler$new(cache_dir = NULL)
# Define a scheduling rule of 7 requests every 2 seconds
scheduler$setRule('www.ebi.ac.uk', n=7, lap=2)
# Create a request object
u <- 'https://www.ebi.ac.uk/webservices/chebi/2.0/test/getCompleteEntity'
url <- sched::URL$new(url=u, params=c(chebiId=15440))
request <- sched::Request$new(url)
# Send the request and get the content result
content <- scheduler$sendRequest(request)
## ------------------------------------------------
## Method `Scheduler$downloadFile`
## ------------------------------------------------
# Create a scheduler instance
scheduler <- sched::Scheduler$new(cache_dir = NULL)
# Create a temporary directory
tmp_dir <- tempdir()
# Download a file
u <- sched::URL$new(
'https://gitlab.com/cnrgh/databases/r-sched/-/raw/main/README.md',
c(ref_type='heads'))
scheduler$downloadFile(u, file.path(tmp_dir, 'README.md'))
# Remove the temporary directory
unlink(tmp_dir, recursive = TRUE)
## ------------------------------------------------
## Method `Scheduler$getUrlString`
## ------------------------------------------------
# Create a scheduler instance
scheduler <- sched::Scheduler$new(cache_dir = NULL)
# Create a URL string
url.str <- scheduler$getUrlString(
'https://www.ebi.ac.uk/webservices/chebi/2.0/test/getCompleteEntity',
params=c(chebiId=15440))
## ------------------------------------------------
## Method `Scheduler$getUrl`
## ------------------------------------------------
# Create a scheduler instance
scheduler <- sched::Scheduler$new(cache_dir = NULL)
# Send request
content <- scheduler$getUrl(
'https://www.ebi.ac.uk/webservices/chebi/2.0/test/getCompleteEntity',
params=c(chebiId=15440))
## ------------------------------------------------
## Method `Scheduler$deleteRules`
## ------------------------------------------------
# Create a scheduler instance
scheduler <- sched::Scheduler$new(cache_dir = NULL)
# Define a rule with custome values
scheduler$setRule('my.other.site', n=10, lap=3)
# Delete all defined rules
scheduler$deleteRules()
## ------------------------------------------------
## Method `Scheduler$getNbRules`
## ------------------------------------------------
# Create a scheduler instance
scheduler <- sched::Scheduler$new(cache_dir = NULL)
# Get the number of defined rules
print(scheduler$getNbRules())
## ------------------------------------------------
## Method `Scheduler$setOffline`
## ------------------------------------------------
# Create a scheduler instance
scheduler <- sched::Scheduler$new(cache_dir = NULL)
# Enable offline mode
scheduler$setOffline(TRUE)