getIndex {argoFloats} | R Documentation |
Get an Index of Available Argo Float Profiles
Description
This function gets an index of available Argo float profiles, typically
for later use as the first argument to getProfiles()
. The source for the
index may be (a) a remote data repository, (b) a local repository (see the
keep
argument), or (c) a cached RDA file that contains the result
of a previous call to getIndex()
(see the age
parameter).
Usage
getIndex(
filename = "core",
server = argoDefaultServer(),
destdir = argoDefaultDestdir(),
age = argoDefaultIndexAge(),
quiet = FALSE,
keep = FALSE,
debug = 0L
)
Arguments
filename |
character value that indicates the file name to be downloaded
from a remote server, or (if |
server |
an indication of the source for |
destdir |
character value indicating the directory in which to store
downloaded files. The default value is to compute this using
|
age |
numeric value indicating how old a downloaded file
must be (in days), for it to be considered out-of-date. The
default, |
quiet |
logical value indicating whether to silence some progress indicators. The default is to show such indicators. |
keep |
logical value indicating whether to retain the
raw index file as downloaded from the server. This is |
debug |
integer value indicating level of debugging. If this
is less than 1, no debugging is done. Otherwise, some functions
will print debugging information. If a function call fails, the
first step should be to rerun the function with |
Details
Using an index from a remote server
The first step is to construct a URL for downloading, based on the
url
and file
arguments. That URL will be a string ending in .gz
,
or .txt
and from this the name of a local file is constructed
by changing the suffix to .rda
and prepending the file directory
specified by destdir
. If an .rda
file of that name already exists,
and is less than age
days old, then no downloading takes place. This
caching procedure is a way to save time, because the download can take
from a minute to an hour, depending on the bandwidth of the connection
to the
server.
The resultant .rda
file, which is named in the return value of this
function, holds a list named index
that holds following elements:
-
ftpRoot
, the FTP root stored in the header of the sourcefile
(see next paragraph). -
server
, the URL at which the index was found, and from whichgetProfiles()
can construct URLs from which to download the NetCDF files for individual float profiles. -
filename
, the argument provided here. -
header
, the preliminary lines in the source file that start with the#
character. -
data
, a data frame containing the items in the source file. The names of these items are determined automatically from"core"
,"bgcargo"
,"synthetic"
files.
Some expertise is required in deciding on the value for the
file
argument to getIndex()
. As of March 2023, the
FTP sites
ftp://usgodae.org/pub/outgoing/argo
and
ftp://ftp.ifremer.fr/ifremer/argo
contain multiple index files, as listed in the left-hand column of the
following table. The middle column lists nicknames
for some of the files. These can be provided as the file
argument,
as alternatives to the full names.
The right-hand column describes the file contents.
Note that the servers also provide files with names similar to those
given in the table, but ending in .txt
. These are uncompressed
equivalents of the .gz
files that offer no advantage and take
longer to download, so getIndex()
is not designed to work with them.
File Name | Nickname | Contents |
ar_greylist.txt | - | Suspicious/malfunctioning floats |
ar_index_global_meta.txt.gz | - | Metadata files |
ar_index_global_prof.txt.gz | "argo" or "core" | Argo data |
ar_index_global_tech.txt.gz | - | Technical files |
ar_index_global_traj.txt.gz | "traj" | Trajectory files |
argo_bio-profile_index.txt.gz | "bgc" or "bgcargo" | Biogeochemical Argo data (without S or T) |
argo_bio-traj_index.txt.gz | "bio-traj" | Bio-trajectory files |
argo_synthetic-profile_index.txt.gz | "synthetic" | Synthetic data, successor to "merge" |
Using a previously downloaded index
In some situations, it can be desirable to work with local
index file that has been copied directly from a remote server.
This can be useful if there is a desire to work with the files
in R separately from the argoFloats
package, or with python, etc.
It can also be useful for group work, in which it is important for
all participants to use the same source file.
This need can be handled with getIndex()
, by specifying filename
as the full path name to the previously downloaded file, and
at the same time specifying server
as NULL. This works for
both the raw files as downloaded from the server (which end
in .gz
, and for the R-data-archive files produced by getIndex()
,
which end in .rda
. Since the .rda
files load an order
of magnitude faster than the .gz
files, this is usually
the preferred approach. However, if the .gz
files are preferred,
perhaps because part of a software chain uses python code that
works with such files, then it should be noted that calling
getIndex()
with keep=TRUE
will save the .gz
file in
the destdir
directory.
Value
An object of class argoFloats
with type="index"
, which
is suitable as the first argument of getProfiles()
.
Author(s)
Dan Kelley and Jaimie Harbin
References
Kelley, D. E., Harbin, J., & Richards, C. (2021). argoFloats: An R package for analyzing Argo data. Frontiers in Marine Science, (8), 636922. doi:10.3389/fmars.2021.635922