largeVectors {XRJulia} | R Documentation |
Internal Computations for Large Vectors
Description
Internal Computations for Large Vectors
Sending Large Vectors between R and Julia
Large vectors will be slow to transfer as JSON, and may fail in Julia. Internal computations have been added to transfer vectors of types real, integer, logical and character by more direct computations when they are large. The computations and their implementation are described here.
R and Julia both have the concept of numeric (floating point) and integer arrays whose elements have a consistent type and both implement
these (following Fortran) as contiguous blocks in memory, augmented by length or dimension information.
They also both have a mechanism for arrays of character strings, class "character"
in R and array type
Array{String, 1}
in Julia.
Julia has arrays for boolean data; R stores the corresponding logical
as integers.
JSON has no such concepts, so interface evaluators using the standard JSON form provided by 'XR' must send such data as a JSON list. This will become inefficient for very large data from these classes. Users have reported failure by Julia to parse the corresponding JSON.
The 'XRJulia' package (as of version 0.7.9) implements special code to send vectors to Julia, by
writing an intermediate file that Julia reads. The actual text sent to Julia is a call to the
relevant Julia function. The code is triggered within the methods for the asServerObject
function, so vectors should be transferred this way whether on their own or as part of a larger structure,
such as an array or the column of a data frame.
Similarly, large arrays to be retrieved in R by the Get()
method or the optional argument .get = TRUE
will be written to an intermediate file by Julia and read by R.
As vectors become large, direct transfer becomes much faster. On a not-very-powerful laptop,
vectors of length 10^7
transfer in an elapsed time of a few seconds. Character vectors are slightly
slower than numeric, as explained below, but in all cases it would be hard to do much computation with
the data that did not swamp the cost of transfer. That said, as always it's more sensible to transfer
data once and then use the corresponding proxy object in later calls.
Details
For all vectors, the method uses binary writes and reads, which are defined
in both R and Julia. No special computationss are needed for numeric, integer, complex and raw.
For these, the R binary representation corresponds to array types in Julia.
The special pseudo-value NA
is defined for vectors in R, but no corresponding concept exists
in Julia. For numeric and complex vectors, the floating-point pattern NaN
is used.
For all other vectors, a warning is issued and either a numeric object or a special character string is used instead.
For logicals, the internal representation in R uses integers. The Julia code when data is sent from R casts the integer array to a boolean array. On the return side, the Julia boolean array is converted to integer before writing.
Character vectors take a little more work, partly because of a weirdness in binary writes
for string arrays in Julia. Where R character vectors can be written in binary form and then read
back in, writing a String
array in Julia omits the end-of-string character,
effectively writing a single string, from which the array cannot
be recovered. Communicating the entire vector to Julia requires
that the Julia side uses this information to split the single string resulting from the R binary write
by matching the end-of-string character explicitly
For sending back to R, the Julia code
appends an end-of-string character to each string before writing the array to a file. This produces the
R format for a binary read of a character vector.
Two fields in the evaluator object control details.
A large object is defined as a vector of length greater than the integer field largeObject
.
Julia creates intermediate files for sending large arrays to R by appending sequenctial numbers to a
character field fileBase
. By default, largeObject
and fileBase
is obtained from
tempfile()
with pattern "Julia"
. Note that all the files are removed at the end of
the evaluation of the expression sending or getting the relevant objects.
Since these fields must be known to the Julia evaluator, they should not be set directly—this will
have no effect. Instead call the function juliaOptions()
with these parameter names.