multi_download {curl} | R Documentation |
Advanced download interface
Description
Download multiple files concurrently, with support for resuming large files.
This function is based on multi_run()
and hence does not error in case any
of the individual requests fail; you should inspect the return value to find
out which of the downloads were completed successfully.
Usage
multi_download(
urls,
destfiles = NULL,
resume = FALSE,
progress = TRUE,
timeout = Inf,
multiplex = FALSE,
...
)
Arguments
urls |
vector with files to download |
destfiles |
vector (of equal length as |
resume |
if the file already exists, resume the download. Note that this may change server responses, see details. |
progress |
print download progress information |
timeout |
in seconds, passed to multi_run |
multiplex |
passed to new_pool |
... |
extra handle options passed to each request new_handle |
Details
Upon completion of all requests, this function returns a data frame with results.
The success
column indicates if a request was successfully completed (regardless
of the HTTP status code). If it failed, e.g. due to a networking issue, the error
message is in the error
column. A success
value NA
indicates that the request
was still in progress when the function was interrupted or reached the elapsed
timeout
and perhaps the download can be resumed if the server supports it.
It is also important to inspect the status_code
column to see if any of the
requests were successful but had a non-success HTTP code, and hence the downloaded
file probably contains an error page instead of the requested content.
Note that when you set resume = TRUE
you should expect HTTP-206 or HTTP-416
responses. The latter could indicate that the file was already complete, hence
there was no content left to resume from the server. If you try to resume a file
download but the server does not support this, success if FALSE
and the file
will not be touched. In fact, if we request to a download to be resumed and the
server responds HTTP 200
instead of HTTP 206
, libcurl will error and not
download anything, because this probably means the server did not respect our
range request and is sending us the full file.
About HTTP/2
Availability of HTTP/2 can increase the performance when making many parallel
requests to a server, because HTTP/2 can multiplex many requests over a single
TCP connection. Support for HTTP/2 depends on the version of libcurl
that
your system has, and the TLS back-end that is in use, check curl_version.
For clients or servers without HTTP/2, curl makes at most 6 connections per
host over which it distributes the queued downloads.
On Windows and MacOS you can switch the active TLS backend by setting an
environment variable CURL_SSL_BACKEND
in your ~/.Renviron
file. On Windows you can switch between SecureChannel
(default) and OpenSSL
where only the latter supports HTTP/2. On MacOS you
can use either SecureTransport
or LibreSSL
, the default varies by MacOS
version.
Value
The function returns a data frame with one row for each downloaded file and the following columns:
-
success
if the HTTP request was successfully performed, regardless of the response status code. This isFALSE
in case of a network error, or in case you tried to resume from a server that did not support this. A value ofNA
means the download was interrupted while in progress. -
status_code
the HTTP status code from the request. A successful download is usually200
for full requests or206
for resumed requests. Anything else could indicate that the downloaded file contains an error page instead of the requested content. -
resumefrom
the file size before the request, in case a download was resumed. -
url
final url (after redirects) of the request. -
destfile
downloaded file on disk. -
error
ifsuccess == FALSE
this column contains an error message. -
type
theContent-Type
response header value. -
modified
theLast-Modified
response header value. -
time
total elapsed download time for this file in seconds. -
headers
vector with http response headers for the request.
Examples
## Not run:
# Example: some large files
urls <- sprintf(
"https://d37ci6vzurychx.cloudfront.net/trip-data/yellow_tripdata_2021-%02d.parquet", 1:12)
res <- multi_download(urls, resume = TRUE) # You can interrupt (ESC) and resume
# Example: revdep checker
# Download all reverse dependencies for the 'curl' package from CRAN:
pkg <- 'curl'
mirror <- 'https://cloud.r-project.org'
db <- available.packages(repos = mirror)
packages <- c(pkg, tools::package_dependencies(pkg, db = db, reverse = TRUE)[[pkg]])
versions <- db[packages,'Version']
urls <- sprintf("%s/src/contrib/%s_%s.tar.gz", mirror, packages, versions)
res <- multi_download(urls)
all.equal(unname(tools::md5sum(res$destfile)), unname(db[packages, 'MD5sum']))
# And then you could use e.g.: tools:::check_packages_in_dir()
# Example: URL checker
pkg_url_checker <- function(dir){
db <- tools:::url_db_from_package_sources(dir)
res <- multi_download(db$URL, rep('/dev/null', nrow(db)), nobody=TRUE)
db$OK <- res$status_code == 200
db
}
# Use a local package source directory
pkg_url_checker(".")
## End(Not run)