C_types {mmap} | R Documentation |
Virtual R Types On Disk.
Description
These functions describe the types of raw binary data stored on disk.
Usage
char(length = 0, nul = TRUE)
uchar(length = 0)
logi8(length = 0)
logi32(length = 0)
int8(length = 0)
uint8(length = 0)
int16(length = 0)
uint16(length = 0)
int24(length = 0)
uint24(length = 0)
int32(length = 0)
int64(length = 0)
real32(length = 0)
real64(length = 0)
cplx(length = 0)
cstring(length = 0, na.strings = "NA")
as.Ctype(x)
is.Ctype(x)
cstring.MaxWidth()
sizeofCtypes()
Arguments
length |
desired length. Not used when passed to mode= in mmap call. |
x |
R object to coerce or test |
nul |
are characters delimited by a nul byte? |
na.strings |
string to convert to R's NA. See Details for current implementation. |
Details
R has very limited storage types. There is one type of integer and one type of float (double). Storage to disk often can be made more efficient by reducing the precision of the data. These functions provide for a sort of virtual mapping from disk to native R type, for use with mmap-ed files.
When a memory mapping is created, a conversion method if declared for both extracting values from disk, as well as replacing elements on disk. The preceeding functions are used in the internal compiled code to handle the conversion.
It is the user's responsibility to ensure that data
fits within the prescribed types. All fixed-width types support
extraction, replacement, and boolean Ops (e.g. ==
). See below
for note on cstring
layout.
cstring
reads nul-terminated strings from binary
C-style arrays. To minimize memory allocation, two additional
steps are carried out. First, when a memory map
is initiated, the length (N) of the character array is
calculated. The calculation of word offsets to facilitate access
are deferred until the first request [
or a Ops
request. This offset calculation requires the creation of
an internal index made up of short integers, representing
the length of each character element. On most platforms, this
is at least 65534 (sizeof(short) - 1 for nul byte),
but can be found via cstring.MaxWidth
.
This index will consume sizeof(short) * N memory, allocated
outside of R.
At present na.strings="NA"
is ignored and all occurances
of the (binary) string ‘NA’ are converted to NA_character_
types
in R. This is also used by the mmap is.na
function.
Value
An R typed vector of length ‘length’ with a virtual type and class ‘Ctype’. Additional information related to number of bytes and whether the vitrual type is signed is also contained.
Warning
The is no attempt to store or read metadata with respect to the extracted or replaced data. This is simply a low level interface to facilitate data reading and writing.
Note
R vectors may be used to create files on disk
matching the specified type using the functions
writeBin
with the appropriate size
argument. See also.
Author(s)
Jeffrey A. Ryan
References
https://en.wikipedia.org/wiki/C_variable_types_and_declarations https://cran.r-project.org/doc/manuals/R-exts.html
See Also
Examples
tmp <- tempfile()
# write a 1 byte signed integer -128:127
writeBin(-127:127L, tmp, size=1L)
file.info(tmp)$size
one_byte <- mmap(tmp, int8())
one_byte[]
munmap(one_byte)
# write a 1 byte unsigned integer 0:255
writeBin(0:255L, tmp, size=1L)
file.info(tmp)$size
one_byte <- mmap(tmp, uint8())
one_byte[]
munmap(one_byte)
# write a 2 byte integer -32768:32767
writeBin(c(-32768L,32767L), tmp, size=2L)
file.info(tmp)$size
two_byte <- mmap(tmp, int16())
two_byte[]
munmap(two_byte)
# write a 2 byte unsigned integer 0:65535
writeBin(c(0L,65535L), tmp, size=2L)
two_byte <- mmap(tmp, uint16())
two_byte[]
# replacement methods automatically (watch precision!!)
two_byte[1] <- 50000
two_byte[]
# values outside of range (above 65535 for uint16 will be wrong)
two_byte[1] <- 65535 + 1
two_byte[]
munmap(two_byte)
# write a 4 byte integer standard R type
writeBin(1:10L, tmp, size=4L)
four_byte <- mmap(tmp, int32())
four_byte[]
munmap(four_byte)
# write 32 bit integers as 64 bit longs (where supported)
int64() # note it is a double in R, but described as int64
writeBin(1:10L, tmp, size=8L)
eight_byte <- mmap(tmp, int64())
storage.mode(eight_byte[]) # using R doubles to preserve most long values
eight_byte[5] <- 2^40 # write as a long, a value in R that is double ~2^53 is representable
eight_byte[5]
munmap(eight_byte)
cstring()
cstring.MaxWidth()
writeBin(c("this","is","a","sentence"), tmp)
strings <- mmap(tmp, cstring())
strings[1:2]
strings[]
munmap(strings)
unlink(tmp)