semantic_enrichment {eHDPrep}R Documentation

Semantic enrichment

Description

Enriches a dataset with additional (meta-)variables derived from the semantic commonalities between variables (columns).

Usage

semantic_enrichment(
  data,
  ontology,
  mapping_file,
  mode = "in",
  root,
  label_attr = "name",
  ...
)

Arguments

data

Required. Numeric data frame or matrix containing variables present in the mapping file.

ontology

Required. One of:

  • Path to ontology edge table in .csv format (String)

  • Edge table in data frame format

  • Graph containing the chosen ontology - must be in tidygraph format or coercible to this format.

.

mapping_file

Required. Path to csv file or data frame containing mapping information. Should contain two columns only. The first column should contain column names, present in the data frame. The second column should contain the name of entities present in the ontology object.

mode

Character constant specifying the directionality of the edges. One of: "in" or "out".

root

Required. Name of root node identifier in column 1 to calculate node depth from.

label_attr

Node attribute containing labels used for column names when creating metavariable aggregations. Default: "name"

...

additional arguments to pass to read_csv when reading 'mapping_file'.

Details

Semantic enrichment generates meta-variables from the aggregation of data variables (columns) via their most informative common ancestor. Meta-variables are labelled using the syntax: MV_[label_attr]_[Aggregation function]. The data variables are aggregated row-wise by their maximum, minimum, mean, sum, and product. Meta-variables with zero entropy (no information) are not appended to the data. See the "Semantic Enrichment" section in the vignette of 'eHDPrep' for more information: vignette("Introduction_to_eHDPrep", package = "eHDPrep")

Value

Semantically enriched dataset

Note

A warning may be shown regarding the '.add' argument being deprecated, this is believed to be an issue with 'tidygraph' which may be resolved in a future release: <https://github.com/thomasp85/tidygraph/issues/131>. Another warning may be shown regarding the 'neimode' argument being deprecated, this is believed to be an issue with 'tidygraph' which may be resolved in a future release: <https://github.com/thomasp85/tidygraph/issues/156>. These warning messages are not believed to have an effect on the functionality of 'eHDPrep'.

See Also

Other high level functionality: apply_quality_ctrl(), assess_quality(), review_quality_ctrl()

Examples

require(magrittr)
require(dplyr)
data(example_ontology)
data(example_mapping_file)
data(example_data)

#' # define datatypes
tibble::tribble(~"var", ~"datatype",
"patient_id", "id",
"tumoursize", "numeric",
"t_stage", "ordinal_tstage",
"n_stage", "ordinal_nstage",
"diabetes_merged", "character",
"hypertension", "factor",
"rural_urban", "factor",
"marital_status", "factor",
"SNP_a", "genotype",
"SNP_b", "genotype",
"free_text", "freetext") -> data_types

# create post-QC data
example_data %>%
  merge_cols(diabetes_type, diabetes, "diabetes_merged", rm_in_vars = TRUE) %>%
  apply_quality_ctrl(patient_id, data_types,
                     bin_cats =c("No" = "Yes", "rural" = "urban"),
                     to_numeric_matrix = TRUE) %>%
                     suppressMessages() ->
                     post_qc_data

# minimal example on first four coloums of example data:
semantic_enrichment(post_qc_data[1:10,1:4],
                    dplyr::slice(example_ontology, 1:7,24),
                    example_mapping_file[1:3,], root = "root") -> res
# see Note section of documentation for information on possible warnings.

# summary of result:
tibble::glimpse(res)


# full example:
 res <- semantic_enrichment(post_qc_data, example_ontology,
 example_mapping_file, root = "root")
 # see Note section of documentation for information on possible warnings.


[Package eHDPrep version 1.3.2 Index]