SolrFrame-class {rsolr} | R Documentation |
SolrFrame
Description
The SolrFrame
object makes Solr data accessible through a
data.frame-like interface. This is the typical way an R user accesses
data from a Solr core. Much of its methods are shared with
SolrList
, which has very similar behavior.
Details
A SolrFrame
should more or less behave analogously to a data
frame. It provides the same basic accessors (nrow
,
ncol
, length
, rownames
,
colnames
, [
, [<-
,
[[
, [[<-
, $
,
$<-
, head
, tail
, etc) and
can be coerced to an actual data frame via
as.data.frame
. Supported types of data manipulations
include subset
, transform
,
sort
, xtabs
, aggregate
,
unique
, summary
, etc.
Mapping a collection of documents to a tablular data structure is not quite natural, as the document collection is ragged: a given document can have any arbitrary set of fields, out of a set that is essentially infinite. Unlike some other document stores, however, Solr constrains the type of every field through a schema. The schema achieves flexibility through “dynamic” fields. The name of a dynamic field is a wildcard pattern, and any document field that matches the pattern is expected to obey the declared type and other constraints.
When determining its set of columns, SolrFrame
takes every
actual field present in the collection, and (by default) adds all
non-dynamic (static) fields, in the order specified by the
schema. Note that is very likely that many columns will consist
entirely or almost entirely of NAs.
If a collection is extremly ragged, where few fields are shared
between documents, it may make more sense to treat the data as a list,
through SolrList
, which shares almost all of the
functionality of SolrFrame
but in a different shape.
The rownames are taken from the field declared in the schema to
represent the unique document key. Schemas are not strictly required
to declare such a field, so if there is no unique key, the rownames
are NULL
.
Field restrictions passed to e.g. [
or subset(fields=)
may be specified by name, or wildcard pattern (glob). Similarly, a row
index passed to [
must be either a character vector of
identifiers (of length <= 1024, NAs are not supported, and this
requires a unique key in the schema) or a
SolrPromise
/SolrExpression
,
but note that if it evaluates to NAs, the corresponding rows are
excluded from the result, as with subset
. Using a
SolrPromise
or SolrExpression
is recommended, as
filtering happens at the database.
A special feature of SolrFrame
, vs. an ordinary data frame, is
that it can be group
ed into a
GroupedSolrFrame
, where every column is modeled
as a list, split by some combination of grouping factors. This is
useful for aggregation and supports the implementation of the
aggregate
method, which is the recommended high-level
interface.
Another interesting feature is laziness. One can defer
a
SolrFrame
, so that all column retrieval, e.g., via $
or
eval
, returns a SolrPromise
object. Many
operations on promises are deferred, until they are finally
fulfill
ed by being shown or through explicit coercion to an R
vector.
A note for developers: SolrList
and SolrFrame
share
common functionality through the base Solr
class. Much of the
functionality mentioned here is actually implemented as methods on the
Solr
class.
Accessors
These are some accessors that SolrFrame
adds on top of the
basic data frame accessors. Most of these are for advanced use only.
-
ndoc(x)
: Gets the number of documents (rows); serves as an abstraction overSolrFrame
andSolrList
-
nfield(x)
: Gets the number of fields (columns); serves as an abstraction overSolrFrame
andSolrList
-
ids(x)
: Gets the document unique identifiers (may beNULL
, treated as rownames); serves as an abstraction overSolrFrame
andSolrList
-
fieldNames(x, includeStatic=TRUE, ...)
: Gets the name of each field represented by any document in the Solr core, with ... being passed down tofieldNames
onSolrCore
. Fields must be indexed to be reported, with the exception that whenincludeStatic
isTRUE
, we ensure all static (non-dynamic) fields are present in the return value. Names are returned in an order consistent with the order in the schema. Note that two different “instances” of the same dynamic field do not have a specified order in the schema, so we use the index order (lexicographical) for those cases. -
core(x)
: Gets theSolrCore
wrapped byx
-
query(x)
: Gets the query that is being constructed byx
Extended API
Most of the typical data frame accessors and data manipulation
functions will work analogously on SolrFrame
(see
Details). Below, we list some of the non-standard methods that might
be seen as an extension of the data frame API.
aggregate(x, data, FUN, ..., subset, na.action, simplify = TRUE, count = FALSE)
: Ifx
is a formula, aggregatesdata
, grouping byx
, by either applyingFUN
, or evaluating an aggregating expression in ..., on each group. Ifcount
isTRUE
, a “count” column is added with the number of elements in each group. The rest of the arguments behave like those for the baseaggregate
.There are two main modes: aggregating with
FUN
, or, as an extension to the baseaggregate
, aggregating with expressions in...
, similar to the interface fortransform
. IfFUN
is specified, then behavior is much like the original, except one can omit the LHS on the formula, in which case the entire frame is passed toFUN
. In the second mode, there is a column in the result for each argument in ..., and there must not be an LHS on the formula.See the documentation for the underlying
facet
function for details on what is supported on the formula RHS.For global aggregation, simply pass the
SolrFrame
asx
, in which case thedata
argument does not exist.Note that the function or expressions are only conceptually evaluated on each group. In reality, the computations occur on grouped columns/promises, which are modeled as lists. Thus, there is potential for conflict, in particular with
length
, which return the number of groups, instead of operating group-wise. One should use the abstractionndoc
instead oflength
, sincendoc
always returns document counts, and thus will return the size of each group.rename(x, ...)
: Renames the columns ofx
, where the names and character values of ... indicates the mapping (newname = oldname
).group(x, by)
: Returns aGroupedSolrFrame
that is grouped by the factors inby
, typically a formula. To get back tox
, callungroup(x)
.grouping(x)
: Just returnsNULL
, since aSolrFrame
is not grouped (unless extended to be groupable).defer(x)
: Returns aSolrFrame
that yieldsSolrPromise
objects instead of vectors whenever a field is retrievedsearchDocs(x, q)
: Performs a conventional document search using the query stringq
. The main difference to filtering is that (by default) Solr will order the result by score, i.e., how well each document matches the query.
Constructor
-
SolrFrame(uri)
: Constructs a newSolrFrame
instance, representing a Solr core located aturi
, which should be a string or aRestUri
object. The ... are passed to theSolrQuery
constructor.
Evaluation
-
eval(expr, envir, enclos)
: Evaluatesexpr
in theSolrFrame
envir
, usingenclos
as the enclosing environment. Theexpr
can be an R language object or aSolrExpression
, either of which are lazily evaluated ifdefer
has been called onenvir
.
Coercion
-
as.data.frame(x, row.names=NULL, optional=FALSE, fill=TRUE)
: Downloads the data into an actual data.frame, specifically an instance ofDocDataFrame
. Iffill
is FALSE, only the fields represented in at least one document are added as columns. -
as.list(x)
: Essentiallyas.list(as.data.frame(x))
, except returns a list of promises ifx
is deferred.
Author(s)
Michael Lawrence
See Also
SolrList
for representing a Solr collection as a
list instead of a table
Examples
schema <- deriveSolrSchema(mtcars)
solr <- TestSolr(schema)
sr <- SolrFrame(solr$uri)
sr[] <- mtcars
dim(sr)
head(sr)
subset(sr, mpg > 20 & cyl == 4)
solr$kill()
## see the vignette for more