| Source {tm} | R Documentation |
Sources
Description
Creating and accessing sources.
Usage
SimpleSource(encoding = "",
length = 0,
position = 0,
reader = readPlain,
...,
class)
getSources()
## S3 method for class 'SimpleSource'
close(con, ...)
## S3 method for class 'SimpleSource'
eoi(x)
## S3 method for class 'DataframeSource'
getMeta(x)
## S3 method for class 'DataframeSource'
getElem(x)
## S3 method for class 'DirSource'
getElem(x)
## S3 method for class 'URISource'
getElem(x)
## S3 method for class 'VectorSource'
getElem(x)
## S3 method for class 'XMLSource'
getElem(x)
## S3 method for class 'SimpleSource'
length(x)
## S3 method for class 'SimpleSource'
open(con, ...)
## S3 method for class 'DataframeSource'
pGetElem(x)
## S3 method for class 'DirSource'
pGetElem(x)
## S3 method for class 'URISource'
pGetElem(x)
## S3 method for class 'VectorSource'
pGetElem(x)
## S3 method for class 'SimpleSource'
reader(x)
## S3 method for class 'SimpleSource'
stepNext(x)
Arguments
x |
A |
con |
A |
encoding |
a character giving the encoding of the elements delivered by the source. |
length |
a non-negative integer denoting the number of elements delivered
by the source. If the length is unknown in advance set it to |
position |
a numeric indicating the current position in the source. |
reader |
a reader function (generator). |
... |
For |
class |
a character vector giving additional classes to be used for the created source. |
Details
Sources abstract input locations, like a directory, a connection, or
simply an R vector, in order to acquire content in a uniform way. In packages
which employ the infrastructure provided by package tm, such sources are
represented via the virtual S3 class Source: such packages then provide
S3 source classes extending the virtual base class (such as
DirSource provided by package tm itself).
All extension classes must provide implementations for the functions
close, eoi, getElem, length, open,
reader, and stepNext. For parallel element access the
(optional) function pGetElem must be provided as well. If
document level metadata is available, the (optional) function getMeta
must be implemented.
The functions open and close open and close the source,
respectively. eoi indicates end of input. getElem fetches the
element at the current position, whereas pGetElem retrieves all
elements in parallel at once. The function length gives the number of
elements. reader returns a default reader for processing elements.
stepNext increases the position in the source to acquire the next
element.
The function SimpleSource provides a simple reference implementation
and can be used when creating custom sources.
Value
For SimpleSource, an object inheriting from class,
SimpleSource, and Source.
For getSources, a character vector with sources provided by package
tm.
open and close return the opened and closed source,
respectively.
For eoi, a logical indicating if the end of input of the source is
reached.
For getElem a named list with the components content holding the
document and uri giving a uniform resource identifier (e.g., a file
path or URL; NULL if not applicable or unavailable). For
pGetElem a list of such named lists.
For length, an integer for the number of elements.
For reader, a function for the default reader.
See Also
DataframeSource, DirSource,
URISource, VectorSource, and
XMLSource.