orderly_deduplicate {orderly} | R Documentation |
Deduplicate an orderly archive
Description
Deduplicate an orderly archive. Deduplicating an orderly archive will replace all files that have the same content with "hard links". This requires hard link support in the underlying operating system, which is available on all unix-like systems (e.g. MacOS and Linux) and on Windows since Vista. However, on windows systems this might require somewhat elevated privileges. If you use this feature, it is very important that you treat your orderly archive as read-only (though you should be anyway) as changing one copy of a linked file changes all the other instances of it - the files are literally the same file.
Usage
orderly_deduplicate(root = NULL, locate = TRUE, dry_run = TRUE, quiet = FALSE)
Arguments
root |
The path to an orderly root directory, or |
locate |
Logical, indicating if the configuration should be
searched for. If |
dry_run |
Logical, indicating if the deduplication should be planned but not run |
quiet |
Logical, indicating if the status should not be printed |
Details
This function will alter your orderly archive. Ordinarily this is not something that should be done, so we try to be careful. In order for this to work, it is very important to treat your orderly archive as read-only generally. If your canonical orderly archive is behind OrderlyWeb this will almost certainly be the case already.
With "hard linking", two files with the same content can be updated so that both files point at the same physical bit of data. This is great, as if the file is large, then only one copy needs to be stored. However, this means that if a change is made to one copy of the file, it is immediately reflected in the other, but there is nothing to indicate that the files are linked!
This approach is worth exploring if you have large files that are
outputs of one report and inputs to another, or large inputs
repeatedly used in different reports, or outputs that end up being
the same in multiple reports. If you run the deduplication with
dry_run = TRUE
, an indication of the savings will be
printed.
Value
Invisibly, information about the duplication status of the archive before deduplication was run.
Examples
path <- orderly::orderly_example("demo")
id1 <- orderly::orderly_run("minimal", root = path)
id2 <- orderly::orderly_run("minimal", root = path)
orderly_commit(id1, root = path)
orderly_commit(id2, root = path)
tryCatch(
orderly::orderly_deduplicate(path, dry_run = TRUE),
error = function(e) NULL)