DFS {hive} | R Documentation |
Hadoop Distributed File System
Description
Functions providing high-level access to the Hadoop Distributed File System (HDFS).
Usage
DFS_cat( file, con = stdout(), henv = hive() )
DFS_delete( file, recursive = FALSE, henv = hive() )
DFS_dir_create( path, henv = hive() )
DFS_dir_exists( path, henv = hive() )
DFS_dir_remove( path, recursive = TRUE, henv = hive() )
DFS_file_exists( file, henv = hive() )
DFS_get_object( file, henv = hive() )
DFS_read_lines( file, n = -1L, henv = hive() )
DFS_rename( from, to, henv = hive() )
DFS_list( path = ".", henv = hive() )
DFS_tail( file, n = 6L, size = 1024L, henv = hive() )
DFS_put( files, path = ".", henv = hive() )
DFS_put_object( obj, file, henv = hive() )
DFS_write_lines( text, file, henv = hive() )
Arguments
henv |
An object containing the local Hadoop configuration. |
file |
a character string representing a file on the DFS. |
files |
a character string representing files located on the local file system to be copied to the DFS. |
n |
an integer specifying the number of lines to read. |
obj |
an R object to be serialized to/from the DFS. |
path |
a character string representing a full path name in the
DFS (without the leading |
recursive |
logical. Should elements of the path other than the last be deleted recursively? |
size |
an integer specifying the number of bytes to be read. Must
be sufficiently large otherwise |
text |
a (vector of) character string(s) to be written to the DFS. |
con |
A connection to be used for printing the output provided by
|
from |
a character string representing a file or directory on the DFS to be renamed. |
to |
a character string representing the new filename on the DFS. |
Details
The Hadoop Distributed File System (HDFS) is typically part of a Hadoop cluster or can be used as a stand-alone general purpose distributed file system (DFS). Several high-level functions provide easy access to distributed storage.
DFS_cat
is useful for producing output in user-defined
functions. It reads from files on the DFS and typically prints the
output to the standard output. Its behaviour is similar to the base
function cat
.
DFS_dir_create
creates directories with the given path names if
they do not already exist. It's behaviour is similar to the base
function dir.create
.
DFS_dir_exists
and DFS_file_exists
return a logical
vector indicating whether the directory or file respectively named by
its argument exist. See also function file.exists
.
DFS_dir_remove
attempts to remove the directory named in its
argument and if recursive
is set to TRUE
also attempts
to remove subdirectories in a recursive manner.
DFS_list
produces a character vector of the names of files
in the directory named by its argument.
DFS_read_lines
is a reader for (plain text) files stored on the
DFS. It returns a vector of character strings representing lines in
the (text) file. If n
is given as an argument it reads that
many lines from the given file. It's behaviour is similar to the base
function readLines
.
DFS_put
copies files named by its argument to a given path in
the DFS.
DFS_put_object
serializes an R object to the DFS.
DFS_write_lines
writes a given vector of character strings to a
file stored on the DFS. It's behaviour is similar to the base
function writeLines
.
Value
DFS_delete()
, DFS_dir_create()
, and DFS_dir_remove
return a logical value indicating if the
operation succeeded for the given argument.
DFS_dir_exists()
and DFS_file_exists()
return TRUE
if
the named directories or files exist in the HDFS.
DFS_get__object()
returns the deserialized object stored in a
file on the HDFS.
DFS_list()
returns a character vector representing the directory listing of the corresponding
path on the HDFS.
DFS_read_lines()
returns a character vector of length the
number of lines read.
DFS_tail()
returns a character vector of length the number of
lines to read until the end of a file on the HDFS.
Author(s)
Stefan Theussl
References
The Hadoop Distributed File System (https://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-hdfs/HdfsDesign.html).
Examples
## Do we have access to the root directory of the DFS?
## Not run: DFS_dir_exists("/")
## Some self-explanatory DFS interaction
## Not run:
DFS_list( "/" )
DFS_dir_create( "/tmp/test" )
DFS_write_lines( c("Hello HDFS", "Bye Bye HDFS"), "/tmp/test/hdfs.txt" )
DFS_list( "/tmp/test" )
DFS_read_lines( "/tmp/test/hdfs.txt" )
## End(Not run)
## Serialize an R object to the HDFS
## Not run:
foo <- function()
"You got me serialized."
sro <- "/tmp/test/foo.sro"
DFS_put_object(foo, sro)
DFS_get_object( sro )()
## End(Not run)
## finally (recursively) remove the created directory
## Not run: DFS_dir_remove( "/tmp/test" )