ganDataModel-package {ganDataModel}R Documentation

Build a Metric Subspaces Data Model for a Data Source

Description

Neural networks are applied to create a density value function which approximates density values for a data source. The trained neural network is analyzed for different levels. For each level metric subspaces with density values above a level are determined. The obtained set of metric subspaces and the trained neural network are assembled into a data model. A prerequisite is the definition of a data source, the generation of generative data and the calculation of density values. These tasks are executed using package 'ganGenerativeData' <https://cran.r-project.org/package=ganGenerativeData>.

Properties of built metric subspaces:

1. They contain data with continuously varying density values above a level.
2. They have the topological property connected. In topology a space is connected when it cannot be represented as the union of disjoint open subspaces.
3. An inclusion relation is defined on them by levels. Higher level metric subspaces are contained in lower level ones.

The inserted images show two-dimensional projections of generative data contained in metric subspaces with assigned labels for the iris dataset.





dm12.png dm34.png dm12c.png dm34c.png

Details

The API includes main functions dmTrain() and dmBuildMetricSubspaces(). dmTrain() trains a neural network that approximates density values for a data source. dmBuildMetricSubspaces() analyzes the trained neural network for a level and determines metric subspaces with density values above a level. The API is used as follows:

1. Prerequisite for building a metric subspaces data model: Create a data source, generate generative data and calculate density values using package ganGenerativeData

dsCreateWithDataFrame() Create a data source with passed data frame.

dsDeactivateColumns() Deactivate columns of a data source in order to exclude them in generation of generative data. In current version only columns with values of type double or float can be used in generation of generative data. All columns with values of other type have to be deactivated.

dsWrite() Write created data source including settings of active columns to a file in binary format.

gdGenerate() Read a data source from a file, generate generative data for the data source in iterative training steps and write generated data to a file in binary format.

gdCalculateDensityValues() Read generative data from a file, calculate density values and write generative data with assigned density values to original file.

2. Build a metric subspaces data model

dmTrain() Read a data source and generative data from files, train a neural network which approximates density values for a data source in iterative training steps, create a data model containing the trained neural network and write it to a file in binary format.

dmBuildMetricSubspaces() Read a data model and generative data from files, analyze the trained neural network in the data model for a level, determine metric subspaces with density values above a level, add obtained metric subspaces to the data model and write it to original file.

dmRemoveMetricSubspaces() Remove metric subspaces in a data model for a level.

dmRead() Read a data model and generative data from files.

dmGetLevels() Get levels for metric subspaces in a data model.

dmGetMetricSubspacesProperties() Get metric subspace properties in a data model for a level.

dmGetContainedInMetricSubspaces() Get metric subspaces in a data model in which a data record is contained.

dmPlotMetricSubspaceParameters() Specify plot parameters for metric subspaces for a level.

dmPlotEvaluateDataSourceParameters() Specify plot parameters for evaluated data source.

dmPlotMetricSubspaces() Create an image file containing two-dimensional projections of generative data contained in metric subspaces and evaluated data source.

dmReset() Reset API.

Author(s)

Werner Mueller

Maintainer: Werner Mueller <werner.mueller5@chello.at>

References

Package 'ganGenerativeData' <https://cran.r-project.org/package=ganGenerativeData>

Examples

# Environment used for execution of examples:

# Operating system: Ubuntu 22.04.1
# Compiler: g++ 11.3.0 (supports C++17 standard)
# R applications: R 4.1.2, RStudio 2022.02.2
# Installed packages: 'Rcpp' 1.0.11, 'tensorflow' 2.11.0,
# 'ganGenerativeData' 1.5.6, 'ganDataModel' 1.1.6

# Package 'tensorflow' provides an interface to machine learning framework
# TensorFlow. To complete the installation function install_tensorflow() has to
# be called.
## Not run: 
library(tensorflow)
install_tensorflow()
## End(Not run)

# 1. Prerequisite for building a metric subspaces data model for the iris
# dataset: Create a data source, generate generative data and calculate density
# values for the iris dataset.

# Load library
## Not run: 
library(ganGenerativeData)
## End(Not run)

# Create a data source with passed iris data frame.
## Not run: 
dsCreateWithDataFrame(iris)
## End(Not run)

# Deactivate the column with index 5 and name Species in order to exclude it in
# generation of generative data.
## Not run: 
dsDeactivateColumns(c(5))
## End(Not run)

# Write the data source including settings of active columns to file "ds.bin" in
# binary format.
## Not run: 
dsWrite("ds.bin")
## End(Not run)

# Read data source from file "ds.bin", train a generative model in iterative
# training steps (used number of iterations in tests is in the range of 10000 to
# 50000), write trained generative model and generated data in training steps to
# files "gm.bin" and "gd.bin".
## Not run: 
gdTrain("gm.bin", "gd.bin", "ds.bin", c(1, 2), gdTrainParameters(2000))
## End(Not run)
# Read generative data from file "gd.bin", calculate density values and
# write generative data with density values to original file.
## Not run: 
gdCalculateDensityValues("gd.bin")
## End(Not run)

# 2. Build a metric subspaces data model for the iris data set

# Load  library
## Not run: 
library(ganDataModel)
## End(Not run)

# Read a data source and generative data from files "ds.bin" and "gd.bin",
# train a neural network which approximates density values for a data source
# in iterative training steps (used number of iterations in tests is in the
# range of 250000 to 300000), create a data model containing the trained neural
# network and write it to a file "dm.bin" in binary format.
## Not run: 
dmTrain("dm.bin", "ds.bin", "gd.bin", 10000)
## End(Not run)

# Read a data model and generative data from files "dm.bin" and "gd.bin",
# build metric subspaces for level 0.7,
# add obtained metric subspaces to the data model
# and write it to original file.
## Not run: 
dmBuildMetricSubspaces("dm.bin", 0.67, "gd.bin")
## End(Not run)

# Read a data model and generative data from files "dm.bin" and "gd,bin".
# Read in data is accessed in function dmPlotMetricSubspaces.
## Not run: 
dmRead("dm.bin", "gd.bin")
## End(Not run)

# Create an image showing a two-dimensional projection of generative data
# contained in metric subspaces fpr level 0.67 for column indices 3, 4 and write
# it to file "ms.png".
## Not run: 
dmPlotMetricSubspaces(
  list(dmPlotMetricSubspaceParameters(level = 0.67,
                                       labels = c("*"),
                                       percent = 75,
                                       boundary = TRUE,
                                       color = "red",
                                       backgroundPercent = 0,
                                       backgroundColor = "red",
                                       backgroundReset = TRUE,
                                       plotLabels = TRUE)),
  "msl.png",
  "Metric Subspaces for the Iris Dataset",
  c(3, 4),
  "ds.bin",
  dmPlotEvaluateDataSourceParameters(0.67))
## End(Not run)

# Read a data model and generative data from files "dm.bin" and "gd.bin",
# build metric subspaces for level 0.71,
# add obtained metric subspaces to the data model
# and write it to original file.
## Not run: 
dmBuildMetricSubspaces("dm.bin", 0.71, "gd.bin")
## End(Not run)

# Read a data model and generative data from files "dm.bin" and "gd,bin".
# Read in data is accessed in function dmPlotMetricSubspaces.
## Not run: 
dmRead("dm.bin", "gd.bin")
## End(Not run)

# Create an image showing a two-dimensional projection of generative data
# contained in metric subspaces for levels 0.67, 0.71 for column indices 3, 4
# and write it to file "msls.png".
## Not run: 
dmPlotMetricSubspaces(
  list(dmPlotMetricSubspaceParameters(level = 0.67,
                                       labels = c("*"),
                                       percent = 75,
                                       boundary = TRUE,
                                       color = "red",
                                       backgroundPercent = 0,
                                       backgroundColor = "red",
                                       backgroundReset = TRUE,
                                       plotLabels = TRUE),
       dmPlotMetricSubspaceParameters(level = 0.71,
                                       labels = c("*"),
                                       percent = 75,
                                       boundary = TRUE,
                                       color = "green",
                                       backgroundPercent = 5,
                                       backgroundColor = "red",
                                       backgroundReset = TRUE,
                                       plotLabels = TRUE)),                                       
  "msls.png",
  "Metric Subspaces for the Iris Dataset",
  c(3, 4),
  "ds.bin",
  dmPlotEvaluateDataSourceParameters(0.67))
## End(Not run)

[Package ganDataModel version 1.1.6 Index]