semantic_enrichment {eHDPrep} | R Documentation |
Semantic enrichment
Description
Enriches a dataset with additional (meta-)variables derived from the semantic commonalities between variables (columns).
Usage
semantic_enrichment(
data,
ontology,
mapping_file,
mode = "in",
root,
label_attr = "name",
...
)
Arguments
data |
Required. Numeric data frame or matrix containing variables present in the mapping file. |
ontology |
Required. One of:
. |
mapping_file |
Required. Path to csv file or data frame containing mapping information. Should contain two columns only. The first column should contain column names, present in the data frame. The second column should contain the name of entities present in the ontology object. |
mode |
Character constant specifying the directionality of the edges. One of: "in" or "out". |
root |
Required. Name of root node identifier in column 1 to calculate node depth from. |
label_attr |
Node attribute containing labels used for column names when creating metavariable aggregations. Default: "name" |
... |
additional arguments to pass to |
Details
Semantic enrichment generates meta-variables from the aggregation of data
variables (columns) via their most informative common ancestor. Meta-variables are
labelled using the syntax: MV_[label_attr]_[Aggregation function]
. The
data variables are aggregated row-wise by their maximum, minimum, mean, sum,
and product. Meta-variables with zero entropy (no information) are not
appended to the data.
See the "Semantic Enrichment" section in the vignette of 'eHDPrep' for more
information: vignette("Introduction_to_eHDPrep", package = "eHDPrep")
Value
Semantically enriched dataset
Note
A warning may be shown regarding the '.add' argument being deprecated, this is believed to be an issue with 'tidygraph' which may be resolved in a future release: <https://github.com/thomasp85/tidygraph/issues/131>. Another warning may be shown regarding the 'neimode' argument being deprecated, this is believed to be an issue with 'tidygraph' which may be resolved in a future release: <https://github.com/thomasp85/tidygraph/issues/156>. These warning messages are not believed to have an effect on the functionality of 'eHDPrep'.
See Also
Other high level functionality:
apply_quality_ctrl()
,
assess_quality()
,
review_quality_ctrl()
Examples
require(magrittr)
require(dplyr)
data(example_ontology)
data(example_mapping_file)
data(example_data)
#' # define datatypes
tibble::tribble(~"var", ~"datatype",
"patient_id", "id",
"tumoursize", "numeric",
"t_stage", "ordinal_tstage",
"n_stage", "ordinal_nstage",
"diabetes_merged", "character",
"hypertension", "factor",
"rural_urban", "factor",
"marital_status", "factor",
"SNP_a", "genotype",
"SNP_b", "genotype",
"free_text", "freetext") -> data_types
# create post-QC data
example_data %>%
merge_cols(diabetes_type, diabetes, "diabetes_merged", rm_in_vars = TRUE) %>%
apply_quality_ctrl(patient_id, data_types,
bin_cats =c("No" = "Yes", "rural" = "urban"),
to_numeric_matrix = TRUE) %>%
suppressMessages() ->
post_qc_data
# minimal example on first four coloums of example data:
semantic_enrichment(post_qc_data[1:10,1:4],
dplyr::slice(example_ontology, 1:7,24),
example_mapping_file[1:3,], root = "root") -> res
# see Note section of documentation for information on possible warnings.
# summary of result:
tibble::glimpse(res)
# full example:
res <- semantic_enrichment(post_qc_data, example_ontology,
example_mapping_file, root = "root")
# see Note section of documentation for information on possible warnings.