Scheduler {sched}R Documentation

Class for scheduling web requests.

Description

Class for scheduling web requests.

Class for scheduling web requests.

Details

The Scheduler class controls the frequency of access to web sites, through the definiton of access rules (Rule class). It handles GET and POST requests, as well as file downloading. It can use a cache system to store request results and avoid resending identical requests.

Methods

Public methods


Method new()

New instance initializer.

There should be only one Scheduler instance in an application. There is no sense in having two or more instances, since they will ignore each other and break the access frequency rules when they contact the same sites.

Usage
Scheduler$new(
  default_rule = Rule$new(),
  ssl_verifypeer = TRUE,
  nb_max_tries = 10L,
  cache_dir = tools::R_user_dir("sched", which = "cache"),
  user_agent = NULL,
  dwnld_timeout = 3600
)
Arguments
default_rule

The default_rule to use when none has been defined for a site.

ssl_verifypeer

If set to TRUE (default), SSL certificate will be checked, otherwise certificates will be ignored.

nb_max_tries

Maximum number of tries when running a request.

cache_dir

Set the path to the file system cache. Set to NULL to disable the cache system. The cache system will save downloaded content and reuse it later for identical requests.

user_agent

The application name and contact address to send to the contacted web server.

dwnld_timeout

The timeout used by downloadFile() method, in seconds.

Returns

Nothing.

Examples
# Create a scheduler instance with a custom default_rule
scheduler <- sched::Scheduler$new(default_rule=sched::Rule$new(10, 1),
                                  cache_dir = NULL)


Method setRule()

Defines a rule for a site.

Defines a rule for a site. The site is identified by its hostname. Each time a request will be made to this host (i.e.: the URL contains the defined hostname), the scheduling rule will be applied in order to wait (sleep) if nedeed before sending the request.

If a rule already exists for this hostname, it will be replaced.

Usage
Scheduler$setRule(host, n = 3L, lap = 1)
Arguments
host

The hostname of the site.

n

Number of events during a time lap.

lap

Duration of a time lap, in seconds.

Returns

Nothing.

Examples
# Create a scheduler instance
scheduler <- sched::Scheduler$new(cache_dir = NULL)

# Define a rule with default values
scheduler$setRule('www.ebi.ac.uk')

# Define a rule with custome values
scheduler$setRule('my.other.site', n=10, lap=3)


Method sendRequest()

Sends a request, and retrieves content result.

Usage
Scheduler$sendRequest(request, cache_read = TRUE)
Arguments
request

A sched::Request instance.

cache_read

If set to TRUE and the cache system is enabled, the cache system will be searched for the request and the cached result returned. In any case, if the the cache system is enabled, and the request sent, the retrieved content will be stored into the cache.

Returns

The results returned by the contacted server, as a single string value.

Examples
# Create a scheduler instance
scheduler <- sched::Scheduler$new(cache_dir = NULL)

# Define a scheduling rule of 7 requests every 2 seconds
scheduler$setRule('www.ebi.ac.uk', n=7, lap=2)

# Create a request object
u <- 'https://www.ebi.ac.uk/webservices/chebi/2.0/test/getCompleteEntity'
url <- sched::URL$new(url=u, params=c(chebiId=15440))
request <- sched::Request$new(url)

# Send the request and get the content result
content <- scheduler$sendRequest(request)


Method downloadFile()

Downloads the content of a URL and save it into the specified destination file.

This method works for any URL, even if it has been written with heavy files in mind. Since it uses utils::download.file() which saves the content directly on disk, the cache system is not used.

Usage
Scheduler$downloadFile(url, dest_file, quiet = FALSE, timeout = NULL)
Arguments
url

The URL to access, as a sched::URL object.

dest_file

A path to a destination file.

quiet

The quiet parameter for utils::download.file().

timeout

The timeout in seconds. Defaults to value provided in initializer.

Returns

Nothing.

Examples
# Create a scheduler instance
scheduler <- sched::Scheduler$new(cache_dir = NULL)

# Create a temporary directory
tmp_dir <- tempdir()

# Download a file
u <- sched::URL$new(
    'https://gitlab.com/cnrgh/databases/r-sched/-/raw/main/README.md',
    c(ref_type='heads'))
scheduler$downloadFile(u, file.path(tmp_dir, 'README.md'))

# Remove the temporary directory
unlink(tmp_dir, recursive = TRUE)


Method getUrlString()

Builds a URL string, using a base URL and parameters to be passed.

The provided base URL and parameters are combined into a full URL string.

DEPRECATED. Use the sched::URL class and its method toString() instead.

Usage
Scheduler$getUrlString(url, params = list())
Arguments
url

A URL string.

params

A list of URL parameters.

Returns

The full URL string as a single character value.

Examples
# Create a scheduler instance
scheduler <- sched::Scheduler$new(cache_dir = NULL)

# Create a URL string
url.str <- scheduler$getUrlString(
  'https://www.ebi.ac.uk/webservices/chebi/2.0/test/getCompleteEntity',
  params=c(chebiId=15440))


Method getUrl()

Sends a request and get the result.

DEPRECATED. Use method sendRequest() instead.

Usage
Scheduler$getUrl(
  url,
  params = list(),
  method = c("get", "post"),
  header = NULL,
  body = NULL,
  encoding = NULL
)
Arguments
url

A URL string.

params

A list of URL parameters.

method

The method to use. Either 'get' or 'post'.

header

The header to send.

body

The body to send.

encoding

The encoding to use.

Returns

The results of the request.

Examples
# Create a scheduler instance
scheduler <- sched::Scheduler$new(cache_dir = NULL)

# Send request
content <- scheduler$getUrl(
  'https://www.ebi.ac.uk/webservices/chebi/2.0/test/getCompleteEntity',
  params=c(chebiId=15440))


Method deleteRules()

Removes all defined rules, including the ones automatically defined using default_rule.

Usage
Scheduler$deleteRules()
Returns

Nothing.

Examples
# Create a scheduler instance
scheduler <- sched::Scheduler$new(cache_dir = NULL)

# Define a rule with custome values
scheduler$setRule('my.other.site', n=10, lap=3)

# Delete all defined rules
scheduler$deleteRules()


Method getNbRules()

Gets the number of defined rules, including the ones automatically defined using default_rule.

Usage
Scheduler$getNbRules()
Returns

The number of rules defined.

Examples
# Create a scheduler instance
scheduler <- sched::Scheduler$new(cache_dir = NULL)

# Get the number of defined rules
print(scheduler$getNbRules())


Method setOffline()

Enables or disables offline mode.

If the offline mode is enabled, an error will be raised when the class attemps to send a request. This mode is mainly useful when debugging the usage of the cache system.

Usage
Scheduler$setOffline(offline)
Arguments
offline

Set to TRUE to enable offline mode, and FALSE otherwise.

Returns

Nothing.

Examples
# Create a scheduler instance
scheduler <- sched::Scheduler$new(cache_dir = NULL)

# Enable offline mode
scheduler$setOffline(TRUE)


Method clone()

The objects of this class are cloneable with this method.

Usage
Scheduler$clone(deep = FALSE)
Arguments
deep

Whether to make a deep clone.

Examples

# Create a scheduler instance without cache
scheduler <- sched::Scheduler$new(cache_dir = NULL)

# Define a rule with default values
scheduler$setRule('www.ebi.ac.uk')

# Create a request object
u <- 'https://www.ebi.ac.uk/webservices/chebi/2.0/test/getCompleteEntity'
url <- sched::URL$new(url=u, params=c(chebiId=15440))
request <- sched::Request$new(url)

# Send the request and get the content result
content <- scheduler$sendRequest(request)


## ------------------------------------------------
## Method `Scheduler$new`
## ------------------------------------------------

# Create a scheduler instance with a custom default_rule
scheduler <- sched::Scheduler$new(default_rule=sched::Rule$new(10, 1),
                                  cache_dir = NULL)


## ------------------------------------------------
## Method `Scheduler$setRule`
## ------------------------------------------------

# Create a scheduler instance
scheduler <- sched::Scheduler$new(cache_dir = NULL)

# Define a rule with default values
scheduler$setRule('www.ebi.ac.uk')

# Define a rule with custome values
scheduler$setRule('my.other.site', n=10, lap=3)


## ------------------------------------------------
## Method `Scheduler$sendRequest`
## ------------------------------------------------

# Create a scheduler instance
scheduler <- sched::Scheduler$new(cache_dir = NULL)

# Define a scheduling rule of 7 requests every 2 seconds
scheduler$setRule('www.ebi.ac.uk', n=7, lap=2)

# Create a request object
u <- 'https://www.ebi.ac.uk/webservices/chebi/2.0/test/getCompleteEntity'
url <- sched::URL$new(url=u, params=c(chebiId=15440))
request <- sched::Request$new(url)

# Send the request and get the content result
content <- scheduler$sendRequest(request)


## ------------------------------------------------
## Method `Scheduler$downloadFile`
## ------------------------------------------------

# Create a scheduler instance
scheduler <- sched::Scheduler$new(cache_dir = NULL)

# Create a temporary directory
tmp_dir <- tempdir()

# Download a file
u <- sched::URL$new(
    'https://gitlab.com/cnrgh/databases/r-sched/-/raw/main/README.md',
    c(ref_type='heads'))
scheduler$downloadFile(u, file.path(tmp_dir, 'README.md'))

# Remove the temporary directory
unlink(tmp_dir, recursive = TRUE)


## ------------------------------------------------
## Method `Scheduler$getUrlString`
## ------------------------------------------------

# Create a scheduler instance
scheduler <- sched::Scheduler$new(cache_dir = NULL)

# Create a URL string
url.str <- scheduler$getUrlString(
  'https://www.ebi.ac.uk/webservices/chebi/2.0/test/getCompleteEntity',
  params=c(chebiId=15440))


## ------------------------------------------------
## Method `Scheduler$getUrl`
## ------------------------------------------------

# Create a scheduler instance
scheduler <- sched::Scheduler$new(cache_dir = NULL)

# Send request
content <- scheduler$getUrl(
  'https://www.ebi.ac.uk/webservices/chebi/2.0/test/getCompleteEntity',
  params=c(chebiId=15440))


## ------------------------------------------------
## Method `Scheduler$deleteRules`
## ------------------------------------------------

# Create a scheduler instance
scheduler <- sched::Scheduler$new(cache_dir = NULL)

# Define a rule with custome values
scheduler$setRule('my.other.site', n=10, lap=3)

# Delete all defined rules
scheduler$deleteRules()


## ------------------------------------------------
## Method `Scheduler$getNbRules`
## ------------------------------------------------

# Create a scheduler instance
scheduler <- sched::Scheduler$new(cache_dir = NULL)

# Get the number of defined rules
print(scheduler$getNbRules())


## ------------------------------------------------
## Method `Scheduler$setOffline`
## ------------------------------------------------

# Create a scheduler instance
scheduler <- sched::Scheduler$new(cache_dir = NULL)

# Enable offline mode
scheduler$setOffline(TRUE)


[Package sched version 1.0.1 Index]