define_fact {starschemar}R Documentation

Define facts in a dimensional_model object

Description

To define facts in a dimensional_model object, the essential data is a name and a set of measurements that can be empty (does not have explicit measurements). Associated with each measurement, an aggregation function is required, which by default is SUM.

Usage

define_fact(
  st,
  name = NULL,
  measures = NULL,
  agg_functions = NULL,
  nrow_agg = "nrow_agg"
)

## S3 method for class 'dimensional_model'
define_fact(
  st,
  name = NULL,
  measures = NULL,
  agg_functions = NULL,
  nrow_agg = "nrow_agg"
)

Arguments

st

A dimensional_model object.

name

A string, name of the fact.

measures

A vector of measure names.

agg_functions

A vector of aggregation function names. If none is indicated, the default is SUM. Additionally they can be MAX or MIN.

nrow_agg

A string, measurement name for the number of rows aggregated.

Details

To get a star schema (a star_schema object) we need a flat table (implemented through a tibble) and a dimensional_model object. The definition of facts in the dimensional_model object is made from the flat table column names. Using the dput function we can list the column names of the flat table so that we do not have to type their names.

Associated with each measurement there is an aggregation function that can be SUM, MAX or MIN. Mean is not considered among the possible aggregation functions: The reason is that calculating the mean by considering subsets of data does not necessarily yield the mean of the total data.

An additional measurement corresponding to the number of aggregated rows is always added which, together with SUM, allows us to obtain the mean if needed.

Value

A dimensional_model object.

See Also

Other star definition functions: define_dimension(), dimensional_model()

Examples


# dput(colnames(mrs_age))
#
# c(
#   "Reception Year",
#   "Reception Week",
#   "Reception Date",
#   "Data Availability Year",
#   "Data Availability Week",
#   "Data Availability Date",
#   "Year",
#   "WEEK",
#   "Week Ending Date",
#   "REGION",
#   "State",
#   "City",
#   "Age Range",
#   "Deaths"
# )

dm <- dimensional_model() |>
  define_fact(
    name = "mrs_age",
    measures = c("Deaths"),
    agg_functions = c("SUM"),
    nrow_agg = "nrow_agg"
  )

dm <- dimensional_model() |>
  define_fact(
    name = "mrs_age",
    measures = c("Deaths")
  )

dm <- dimensional_model() |>
  define_fact(name = "Factless fact")


[Package starschemar version 1.2.5 Index]