R: Runs all of the machine learning microbenchmarks

RunMachineLearningBenchmark {RHPCBenchmark}

R Documentation

Runs all of the machine learning microbenchmarks

Description

RunMachineLearningBenchmark runs all of the microbenchmarks for performance testing machine learning functionality

Usage

RunMachineLearningBenchmark(runIdentifier, resultsDirectory,
  clusteringMicrobenchmarks = GetClusteringDefaultMicrobenchmarks())

Arguments

`runIdentifier`	a character string specifying the suffix to be appended to the base of the file name of the output CSV format files
`resultsDirectory`	a character string specifying the directory where all of the CSV performance results files will be saved
`clusteringMicrobenchmarks`	a list of `ClusteringMicrobenchmark` objects defining the clustering microbenchmarks to execute as part of the machine learning benchmark. Default values are provided by the function `GetClusteringDefaultMicrobenchmarks`.

Details

This function runs the machine learning microbenchmarks, which are divided into four categories supported by this benchmark, defined in the clusteringMicrobenchmarks input list. For each microbenchmark, it attempts to create a separate output file in CSV format containing the performance results for data set and function tested by the microbenchmark. The names of the output files follow the format benchmarkName_runIdentifier.csv, where benchmarkName is specified in the ClusteringMicrobenchmark object of each microbenchmark and runIdentifier is an input parameter to this function. If the file already exists, the results will be appended to the existing file. Each input list contains instances of the ClusteringMicrobenchmark class defining each microbenchmark. Each microbenchmark object with the active field set to TRUE will be executed. The lists of default microbenchmarks are generated by the function GetClusteringDefaultMicrobenchmarks. Each ClusteringMicrobenchmark specifies an R data file which contains the data object needed by the microbenchmark. The needed R data files should either be given in an attached R package or given in the data subdirectory of the current working directory, and they should have the extension .RData. If the linear algebra kernels are multithreaded, by linking to multithreaded BLAS or LAPACK libraries for example, then the number of threads must be retrievable from an environment variable which is set before execution of the R programming environment. The name of the environment variable specifying the number of threads must be provided in the R HPC benchmark environment variable R_BENCH_NUM_THREADS_VARIABLE. This function will retrieve the number of threads through R_BENCH_NUM_THREADS_VARIABLE so that the number of threads can be printed to the results files and recorded in data frames for reporting purposes. This function utilizes the number of threads only for reporting purposes and is not used by the benchmark to effect the actual number of threads utilized by the kernels, as that is assumed to be controlled by the numerical library. An error exception will be thrown if the environment variable R_BENCH_NUM_THREADS_VARIABLE and the variable it is set to are not both set.

Value

a data frame containing the user, system, and elapsed (wall clock) time of times of each performance trial

Examples

## Not run: 
# Set needed environment variables for multithreading.  Only single threading
# is used in the example.
#
# Note: The environment variables are usually set by the user before starting
#       the R programming environment; they are set here only to facilitate
#       a working example.  See the section on multithreading in the vignette
#       for further details.
Sys.setenv(R_BENCH_NUM_THREADS_VARIABLE="MKL_NUM_THREADS")
Sys.setenv(MKL_NUM_THREADS="1")
#
# Generate example microbechmarks that can be run in a few minutes; see
# the vignette for more involved examples. Clustering microbenchmarks
# are defined in the examples.
#
# Note: These microbenchmarks are different than the microbenchmarks
#       generated by \code{\link{GetDenseMatrixDefaultMicrobenchmarks}}.
#       They are chosen for their short run times and suitability for
#       example code. 
exampleMicrobenchmarks <- GetClusteringExampleMicrobenchmarks()
# Set the output directory of the CSV summary results files
resultsDirectory <- "./MachineLearningExampleOutput"
# Create the output directory
dir.create(resultsDirectory)
# Set an appropriate run identifier
runIdentifier <- "example"
resultsFrame <- RunMachineLearningBenchmark(runIdentifier, resultsDirectory,
   clusteringMicrobenchmarks=exampleMicrobenchmarks)

# Create a new clustering microbenchmark that tests the clara method from
# the cluster package using a data set with 16 features, 8 clusters, and
# 1000 normally distributed feature vectors per cluster. 
claraMicrobenchmark <- list()
claraMicrobenchmark[["clara_cluster_16_8_1000"]] <- methods::new(
   "ClusteringMicrobenchmark",
   active = TRUE,
   benchmarkName = "clara_cluster_16_8_1000",
   benchmarkDescription = "Example of new clara microbenchmark",
   dataObjectName = NA_character_,
   numberOfFeatures = as.integer(16),
   numberOfClusters = as.integer(8),
   numberOfFeatureVectorsPerCluster = as.integer(1000),
   numberOfTrials = as.integer(3),
   numberOfWarmupTrials = as.integer(1),
   allocatorFunction = ClusteringAllocator,
   benchmarkFunction = ClaraClusteringMicrobenchmark
)

# Set an appropriate run identifier
runIdentifier <- "clara_new"
# Run the clara microbenchmark
claraResults <- RunMachineLearningBenchmark(runIdentifier, resultsDirectory,
   clusteringMicrobenchmarks=claraMicrobenchmark)

## End(Not run)

[Package RHPCBenchmark version 0.1.0 Index]