perform_join {ERDbuilder}R Documentation

Use inner join (unless the other is specified)

Description

The perform_join function uses an inner join unless the user specifies the join type.

Usage

perform_join(erd_object, tables_to_join, specified_joins = NULL)

Arguments

erd_object

An object of class "ERD", which encapsulates the data frames and the relationships between them. This object is generated using the create_erd function.

tables_to_join

A character vector listing the names of tables to join. The first table in this list serves as the main table to which subsequent tables are joined. The tables are joined in the order specified and utilize the relationships defined with the first table.

specified_joins

An optional named list where each element's name corresponds to a table in tables_to_join and the value specifies the type of join to perform with that table. The default value is NULL, which activates automated mode (which uses inner joins).

Details

This orchestrates the joining of multiple tables based on a specified Entity-Relationship Diagram (ERD) object. This function extracts the relationships and join criteria defined within the ERD object and executes the appropriate join operations using R's dplyr package.

The function can operate in two modes: automated and user-specified joins. In automated mode, join types are determined by the relationship symbols in the ERD object. In user-specified mode, the types of joins are explicitly provided by the user.

Implementation Details:

- Join Variables: For each pair of tables to be joined, the function extracts the relevant join variables from the ERD object.

- Join Type: Depending on the relationship symbol associated with each pair of tables, the function decides whether to perform an inner join or a left join. This decision is implemented by dynamically invoking the corresponding dplyr function (inner_join or left_join).

- Aggregation: The function uses do.call to dynamically execute the appropriate join operation, accumulating the result in the main_table variable, which is ultimately returned.

Notes:

- The function iteratively applies the join operations, using the first table in tables_to_join as the main table.

- The join operations are performed in the order specified in tables_to_join.

- When specified_joins is NULL, the function operates in automated mode, determining the type of join based on relationship symbols.

- The names in specified_joins should match the table names in tables_to_join for user-specified mode to function correctly.

Value

A data frame resulting from the join operations conducted between the specified tables, consistent with the relationships indicated in the ERD object. Additionally, the types of joins used are printed to the console.

Examples



# Load Packages -----------------------------------------------------------

library(ERDbuilder)
library(dplyr)

# Define entities ---------------------------------------------------------

students_tbl <- data.frame(
  st_id = c("hu1", "de2", "lo3"),
  dep_id = c("water", "evil", "values"),
  student = c("Huey", "Dewey", "Louie"),
  email = c("hubert.duck", "dewfort.duck", "llewellyn.duck"),
  dob = c("04-15", "04-15", "04-15")
)

courses_tbl <- data.frame(
  crs_id = c("water101", "evil205", "water202"),
  fac_id = c("02do", "03pe", "04mi"),
  dep_id = c("water", "evil", "water"),
  course = c("Swimming", "Human-chasing", "Dives")
)

enrollment_tbl <- data.frame(
  crs_id = c("water101", "evil205", "evil205", "water202"),
  st_id = c("hu1", "hu1", "de2", "de2"),
  final_grade = c("B", "A", "A", "F")
)

department_tbl <- data.frame(
  dep_id = c("water", "evil", "values"),
  department = c("Water activities", "Evil procurement", "Good values")
)

faculty_tbl <- data.frame(
  faculty_name = c("Scrooge McDuck", "Donald", "Pete", "Mickey"),
  title = c("Emeritus", "Full", "Assistant", "Full"),
  fac_id = c("01sc", "02do", "03pe", "04mi"),
  dep_id = c("water", "water", "evil", "values")
)

head(students_tbl)
head(courses_tbl)
head(enrollment_tbl)
head(department_tbl)
head(faculty_tbl)

## Define relationships----------------------------------------
relationships <- list(
  courses = list(
    enrollment = list(crs_id = "crs_id", relationship = c("||", "|<")),
    department = list(dep_id = "dep_id", relationship = c(">|", "||")),
    faculty = list(fac_id = "fac_id", relationship = c(">0", "||"))
  ),
  enrollment = list(
    students = list(st_id = "st_id", relationship = c(">0", "||")
    )
  ),
  students = list(
    department = list(dep_id = "dep_id", relationship = c(">|", "||"))
  ),
  faculty = list(
    department = list(dep_id = "dep_id", relationship = c(">|", "||"))
  )
)

## Create ERD object
erd_object <-
  create_erd(
    list(
      students = students_tbl,
      courses = courses_tbl,
      enrollment = enrollment_tbl,
      department = department_tbl,
      faculty = faculty_tbl
    ),
    relationships)

## Render ERD -----------------------------------------------------------
render_erd(erd_object, label_distance = 0, label_angle = 15, n = 20)


# Perform joins -----------------------------------------------------------

perform_join(erd_object, c("courses", "enrollment", "department"))

[Package ERDbuilder version 1.0.0 Index]