RunMachineLearningBenchmark {RHPCBenchmark} | R Documentation |
Runs all of the machine learning microbenchmarks
Description
RunMachineLearningBenchmark
runs all of the microbenchmarks for
performance testing machine learning functionality
Usage
RunMachineLearningBenchmark(runIdentifier, resultsDirectory,
clusteringMicrobenchmarks = GetClusteringDefaultMicrobenchmarks())
Arguments
runIdentifier |
a character string specifying the suffix to be appended to the base of the file name of the output CSV format files |
resultsDirectory |
a character string specifying the directory where all of the CSV performance results files will be saved |
clusteringMicrobenchmarks |
a list of
|
Details
This function runs the machine learning microbenchmarks, which are divided
into four categories supported by this benchmark, defined in the
clusteringMicrobenchmarks
input list. For each microbenchmark, it
attempts to create a separate output file in CSV format containing the
performance results for data set and function tested by the microbenchmark.
The names of the output files follow the format
benchmarkName
_runIdentifier
.csv, where
benchmarkName
is specified in the
ClusteringMicrobenchmark
object of each microbenchmark and
runIdentifier
is an input parameter to this function. If the file
already exists, the results will be appended to the existing file. Each
input list contains instances of the
ClusteringMicrobenchmark
class defining each
microbenchmark. Each microbenchmark object with the
active
field set to TRUE will be executed. The lists of default
microbenchmarks are generated by the function
GetClusteringDefaultMicrobenchmarks
. Each
ClusteringMicrobenchmark
specifies an R data file which contains
the data object needed by the microbenchmark. The needed R data
files should either be given in an attached R package or given in the
data
subdirectory of the current working directory, and they should
have the extension .RData
. If the linear algebra kernels are
multithreaded, by linking to multithreaded BLAS or LAPACK libraries for
example, then the number of threads must be retrievable from an environment
variable which is set before execution of the R programming environment.
The name of the environment variable specifying the number of threads must
be provided in the R HPC benchmark environment variable
R_BENCH_NUM_THREADS_VARIABLE. This function will retrieve the number of
threads through R_BENCH_NUM_THREADS_VARIABLE so that the number of threads
can be printed to the results files and recorded in data frames for reporting
purposes. This function utilizes the number of threads only for reporting
purposes and is not used by the benchmark to effect the actual number of
threads utilized by the kernels, as that is assumed to be controlled by the
numerical library. An error exception will be thrown if the environment
variable R_BENCH_NUM_THREADS_VARIABLE and the variable it is set to are not
both set.
Value
a data frame containing the user, system, and elapsed (wall clock) time of times of each performance trial
See Also
GetClusteringDefaultMicrobenchmarks
GetClusteringExampleMicrobenchmarks
Examples
## Not run:
# Set needed environment variables for multithreading. Only single threading
# is used in the example.
#
# Note: The environment variables are usually set by the user before starting
# the R programming environment; they are set here only to facilitate
# a working example. See the section on multithreading in the vignette
# for further details.
Sys.setenv(R_BENCH_NUM_THREADS_VARIABLE="MKL_NUM_THREADS")
Sys.setenv(MKL_NUM_THREADS="1")
#
# Generate example microbechmarks that can be run in a few minutes; see
# the vignette for more involved examples. Clustering microbenchmarks
# are defined in the examples.
#
# Note: These microbenchmarks are different than the microbenchmarks
# generated by \code{\link{GetDenseMatrixDefaultMicrobenchmarks}}.
# They are chosen for their short run times and suitability for
# example code.
exampleMicrobenchmarks <- GetClusteringExampleMicrobenchmarks()
# Set the output directory of the CSV summary results files
resultsDirectory <- "./MachineLearningExampleOutput"
# Create the output directory
dir.create(resultsDirectory)
# Set an appropriate run identifier
runIdentifier <- "example"
resultsFrame <- RunMachineLearningBenchmark(runIdentifier, resultsDirectory,
clusteringMicrobenchmarks=exampleMicrobenchmarks)
# Create a new clustering microbenchmark that tests the clara method from
# the cluster package using a data set with 16 features, 8 clusters, and
# 1000 normally distributed feature vectors per cluster.
claraMicrobenchmark <- list()
claraMicrobenchmark[["clara_cluster_16_8_1000"]] <- methods::new(
"ClusteringMicrobenchmark",
active = TRUE,
benchmarkName = "clara_cluster_16_8_1000",
benchmarkDescription = "Example of new clara microbenchmark",
dataObjectName = NA_character_,
numberOfFeatures = as.integer(16),
numberOfClusters = as.integer(8),
numberOfFeatureVectorsPerCluster = as.integer(1000),
numberOfTrials = as.integer(3),
numberOfWarmupTrials = as.integer(1),
allocatorFunction = ClusteringAllocator,
benchmarkFunction = ClaraClusteringMicrobenchmark
)
# Set an appropriate run identifier
runIdentifier <- "clara_new"
# Run the clara microbenchmark
claraResults <- RunMachineLearningBenchmark(runIdentifier, resultsDirectory,
clusteringMicrobenchmarks=claraMicrobenchmark)
## End(Not run)