algebra {relations} | R Documentation |
Relational Algebra
Description
Various “relational algebra”-like operations.
Usage
relation_projection(x, margin = NULL)
relation_selection(x, subset)
relation_cartesian(x, y, ...)
relation_complement(x, y)
relation_intersection(x, y, ...)
relation_union(x, y, ...)
relation_symdiff(x, y)
relation_division(x, y)
relation_remainder(x, y)
relation_join(x, y, ...)
relation_semijoin(x, y, ...)
relation_antijoin(x, y, ...)
Arguments
x , y |
Relation objects. |
margin |
Either a character vector of domain names, or an integer vector of domain indices. |
subset |
Expression resulting in a logical vector of length equal to the number of tuples in the graph. |
... |
Relation objects for |
Details
These functions provide functionality similar to the corresponding operations defined in relational algebra theory as introduced by Codd (1970). Note, however, that domains in database relations, unlike the concept of relations we use here, are unordered. In fact, a database relation (“table”) is defined as a set of elements called “tuples”, where the “tuple” components are named, but unordered. So in fact, a “tuple” in this sense is a set of mappings from the attribute names into the union of the attribute domains.
The projection of a relation on a specified margin (i.e., a vector of domain names or indices) is the relation obtained when all tuples are restricted to this margin. As a consequence, duplicate tuples are removed.
The selection of a relation is the relation obtained by taking a subset of the relation graph, defined by some logical expression.
The Cartesian product of two relations is obtained by basically building the Cartesian product of all graph elements, but combining the resulting pairs into single tuples.
The union of two relations simply combines the graph elements
of both relations; the complement of two relations R
and
S
removes the tuples of S
from R
.
The intersection (symmetric difference) of two relations is the relation with all tuples they have (do not have) in common.
The division of relation R
by relation S
is the
reversed Cartesian product. The result is a relation with the domain
unique to R
and containing the maximum number of tuples which,
multiplied by S
, are contained in R
. The remainder
of this operation is the complement of R
and the division of
R
by S
. Note that for both operations, the domain of
S
must be contained in the domain of R
.
The (natural) join of two relations is their Cartesian product,
restricted to the subset where the elements of the common attributes
do match. The left/right/full outer join of two relations
R
and S
is
the union of R
/S
/R
and S
,
and the inner join of R
and S
.
The implementation uses merge()
, and so the
left/right/full outer joins are obtained by setting
all.x
/all.y
/all
to TRUE
in
relation_join()
.
The domains to be matched are specified using by
.
The left (right) semijoin of two relations R
and S
is the join of these, projected to the attributes of R
(S
). Thus, it yields all tuples of R
(S
) participating in the join of R
and S
.
The left (right) antijoin of two relations R
and S
is the complement of R
(S
) and the join of both,
projected to the attributes of R
(S
).
Thus, it yields all tuples of R
(S
) not participating in the join of R
and S
.
The operators %><%
, %=><%
, %><=%
,
%=><=%
, %|><%
, %><|%
, %|><|%
,
%|>%
, %<|%
, and %U%
can be used for the
Cartesian product, left outer join, right outer join, full outer join,
left semi-join, right semi-join, join, left antijoin, right antijoin,
and union, respectively.
References
E. F. Codd (1970), A relational model of data for large shared data banks. Communications of the ACM, 13/6, 377–387. doi:10.1145/362384.362685.
See Also
relation()
Examples
## projection
Person <-
data.frame(Name = c("Harry", "Sally", "George", "Helena", "Peter"),
Age = c(34, 28, 29, 54, 34),
Weight = c(80, 64, 70, 54, 80),
stringsAsFactors = FALSE)
Person <- as.relation(Person)
relation_table(Person)
relation_table(relation_projection(Person, c("Age", "Weight")))
## selection
relation_table(R1 <- relation_selection(Person, Age < 29))
relation_table(R2 <- relation_selection(Person, Age >= 34))
relation_table(R3 <- relation_selection(Person, Age == Weight))
## union
relation_table(R1 %U% R2)
## works only for the same domains:
relation_table(R2 | R3)
## complement
relation_table(Person - R2)
## intersection
relation_table(relation_intersection(R2, R3))
## works only for the same domains:
relation_table(R2 & R3)
## symmetric difference
relation_table(relation_symdiff(R2, R3))
## Cartesian product
Employee <-
data.frame(Name =
c("Harry", "Sally", "George", "Harriet", "John"),
EmpId = c(3415, 2241, 3401, 2202, 3999),
DeptName =
c("Finance", "Sales", "Finance", "Sales", "N.N."),
stringsAsFactors = FALSE)
Employee <- as.relation(Employee)
relation_table(Employee)
Dept <- data.frame(DeptName = c("Finance", "Sales", "Production"),
Manager = c("George", "Harriet", "Charles"),
stringsAsFactors = FALSE)
Dept <- as.relation(Dept)
relation_table(Dept)
relation_table(Employee %><% Dept)
## Natural join
relation_table(Employee %|><|% Dept)
## left (outer) join
relation_table(Employee %=><% Dept)
## right (outer) join
relation_table(Employee %><=% Dept)
## full outer join
relation_table(Employee %=><=% Dept)
## antijoin
relation_table(Employee %|>% Dept)
relation_table(Employee %<|% Dept)
## semijoin
relation_table(Employee %|><% Dept)
relation_table(Employee %><|% Dept)
## division
Completed <-
data.frame(Student = c("Fred", "Fred", "Fred", "Eugene",
"Eugene", "Sara", "Sara"),
Task = c("Database1", "Database2", "Compiler1",
"Database1", "Compiler1", "Database1",
"Database2"),
stringsAsFactors = FALSE)
Completed <- as.relation(Completed)
relation_table(Completed)
DBProject <- data.frame(Task = c("Database1", "Database2"),
stringsAsFactors = FALSE)
DBProject <- as.relation(DBProject)
relation_table(DBProject)
relation_table(Completed %/% DBProject)
## division remainder
relation_table(Completed %% DBProject)