memCompress {base} | R Documentation |
In-memory Compression and Decompression
Description
In-memory compression or decompression for raw vectors.
Usage
memCompress(from, type = c("gzip", "bzip2", "xz", "none"))
memDecompress(from,
type = c("unknown", "gzip", "bzip2", "xz", "none"),
asChar = FALSE)
Arguments
from |
raw vector. For |
type |
character string, the type of compression. May be abbreviated to a single letter, defaults to the first of the alternatives. |
asChar |
logical: should the result be converted to a character
string? NB: character strings have a limit of
|
Details
type = "none"
passes the input through unchanged, but may be
useful if type
is a variable.
type = "unknown"
attempts to detect the type of compression
applied (if any): this will always succeed for bzip2
compression, and will succeed for other forms if there is a suitable
header. If no type of compression is detected this is the same as
type = "none"
but a warning is given.
gzip
compression uses whatever is the default compression
level of the underlying library (usually 6
). This supports the
RFC 1950 format, sometimes known as ‘zlib’ format, for
compression and decompression and for decompression only RFC 1952, the
‘gzip’ format (which wraps the ‘zlib’ format with a
header and footer).
bzip2
compression always adds a header ("BZh"
). The
underlying library only supports in-memory (de)compression of up to
2^{31}-1
elements. Compression is equivalent to
bzip2 -9
(the default).
Compressing with type = "xz"
is equivalent to compressing a
file with xz -9e
(including adding the ‘magic’
header): decompression should cope with the contents of any file
compressed by xz
version 4.999 and later, as well as by some
versions of lzma
. There are other versions, in particular
‘raw’ streams, that are not currently handled.
All the types of compression can expand the input: for "gzip"
and "bzip2"
the maximum expansion is known and so
memCompress
can always allocate sufficient space. For
"xz"
it is possible (but extremely unlikely) that compression
will fail if the output would have been too large.
Value
A raw vector or a character string (if asChar = TRUE
).
libdeflate
Support for the libdeflate
library was added for R 4.4.0. It
uses different code for the RFC 1950 ‘zlib’ format (and RFC
1952 for decompression), expected to be substantially faster than
using the reference (or system) zlib library. It is used for
type = "gzip"
if available.
The headers and sources can be downloaded from https://github.com/ebiggers/libdeflate and pre-built versions are available for most Linux distributions. It is used for binary Windows distributions.
See Also
extSoftVersion
for the versions of the zlib
or
libdeflate
, bzip2
and xz
libraries in use.
https://en.wikipedia.org/wiki/Data_compression for background on data compression, https://zlib.net/, https://en.wikipedia.org/wiki/Gzip, http://www.bzip.org/, https://en.wikipedia.org/wiki/Bzip2, and https://en.wikipedia.org/wiki/XZ_Utils for references about the particular schemes used.
Examples
txt <- readLines(file.path(R.home("doc"), "COPYING"))
sum(nchar(txt))
txt.gz <- memCompress(txt, "g") # "gzip", the default
length(txt.gz)
txt2 <- strsplit(memDecompress(txt.gz, "g", asChar = TRUE), "\n")[[1]]
stopifnot(identical(txt, txt2))
## as from R 4.4.0 this is detected if not specified.
txt2b <- strsplit(memDecompress(txt.gz, asChar = TRUE), "\n")[[1]]
stopifnot(identical(txt2b, txt2))
txt.bz2 <- memCompress(txt, "b")
length(txt.bz2)
## can auto-detect bzip2:
txt3 <- strsplit(memDecompress(txt.bz2, asChar = TRUE), "\n")[[1]]
stopifnot(identical(txt, txt3))
## xz compression is only worthwhile for large objects
txt.xz <- memCompress(txt, "x")
length(txt.xz)
txt3 <- strsplit(memDecompress(txt.xz, asChar = TRUE), "\n")[[1]]
stopifnot(identical(txt, txt3))
## test decompressing a gzip-ed file
tf <- tempfile(fileext = ".gz")
con <- gzfile(tf, "w")
writeLines(txt, con)
close(con)
(nf <- file.size(tf))
# if (nzchar(Sys.which("file"))) system2("file", tf)
foo <- readBin(tf, "raw", n = nf)
unlink(tf)
## will detect the gzip header and choose type = "gzip"
txt3 <- strsplit(memDecompress(foo, asChar = TRUE), "\n")[[1]]
stopifnot(identical(txt, txt3))