classifyFun {PredPsych} | R Documentation |
Generic Classification Analyses
Description
function for performing generic classification Analysis
Usage
classifyFun(Data, classCol, selectedCols, cvType, ntrainTestFolds,
nTrainFolds, modelTrainFolds, nTuneFolds, tuneFolds, foldSep, cvFraction,
ranges = NULL, tune = FALSE, cost = 1, gamma = 0.5,
classifierName = "svm", genclassifier, silent = FALSE,
extendedResults = FALSE, SetSeed = TRUE, NewData = NULL, ...)
Arguments
Data |
(dataframe) dataframe of the data |
classCol |
(numeric or string) column number that contains the variable to be predicted |
selectedCols |
(optional) (numeric or string) all the columns of data that would be used either as predictor or as feature |
cvType |
(optional) (string) which type of cross-validation scheme to follow; One of the following values:
|
ntrainTestFolds |
(optional) (parameter for only k-fold cross-validation) No. of folds for training and testing dataset |
nTrainFolds |
(optional) (parameter for only k-fold cross-validation) No. of folds in which to further divide Training dataset |
modelTrainFolds |
= (optional) (parameter for only k-fold cross-validation) specific folds from the first train/test split (ntrainTestFolds) to use for training |
nTuneFolds |
(optional) (parameter for only k-fold cross-validation) No. of folds for Tuning |
tuneFolds |
(optional) (parameter for only k-fold cross-validation) specific folds from the above nTuneFolds to use for tuning |
foldSep |
(numeric) (parameter for only Leave-One_subject Out) mandatory column number for Leave-one-subject out cross-validation. |
cvFraction |
(optional) (numeric) Fraction of data to keep for training data |
ranges |
(optional) (list) ranges for tuning support vector machine |
tune |
(optional) (logical) whether tuning of svm parameters should be performed or not |
cost |
(optional) (numeric) regularization parameter of svm |
gamma |
(optional) (numeric) rbf kernel parameter |
classifierName |
(optional) (string) name of the classifier to be used |
genclassifier |
(optional) (function or string) a classifier function or a name (e.g. Classifier.svm) |
silent |
(optional) (logical) whether to print messages or not |
extendedResults |
(optional) (logical) Return extended results with model and other metrics |
SetSeed |
(optional) (logical) Whether to setseed or not. use SetSeed to seed the random number generator to get consistent results; set false only for permutation tests |
NewData |
(optional) (dataframe) New Data frame features for which the class membership is requested |
... |
(optional) additional arguments for the function |
Details
This function implements Classification Analysis. Classification Analysis is a supervised machine learning approach that attempts to identify holistic patters in the data and assign to it classes (classification). Given a set of features, a classification analysis automatically learns intrinsic patterns in the data to be able to predict respective classes. If the data features are informative about the classes, a high classification score would be achieved.
Value
Depending upon extendedResults
. extendedResults
= FALSE outputs Test accuracy accTest
of discrimination; extendedResults
= TRUE
outputs Test accuracy accTest
of discrimination, accTestRun
discrimination for each run in case of cvType as LOSO,LOTO or Folds ConfMatrix
Confusion matrices and classificationResults
list of the cross-validation results including the model
and ConfusionMatrixResults
Overall cross-validated confusion matrix results
Author(s)
Atesh Koul, C'MON unit, Istituto Italiano di Tecnologia
References
Duda, R. O., Hart, P. E., & Stork, D. G. (2000). Pattern Classification. Wiley-Interscience (Vol. 24).
Vapnik, V. (1995). The Nature of statistical Learning Theory. Springer-Verlag New York.
Hsu, C. C., Chang, C. C., & Lin, C. C. (2003). A practical guide to support vector classification, 1(1), 1-16.
Examples
# classification analysis with SVM
Results <- classifyFun(Data = KinData,classCol = 1,
selectedCols = c(1,2,12,22,32,42,52,62,72,82,92,102,112),cvType="holdout")
# Output:
# Performing Classification Analysis
#
# Performing holdout Cross-validation
# genclassifier was not specified,
# Using default value of Classifier.svm (genclassifier = Classifier.svm)"
#
# cvFraction was not specified,
# Using default value of 0.8 (cvFraction = 0.8)
#
# Proportion of Test/Train Data was : 0.2470588
# [1] "Test holdout Accuracy is 0.65"
# holdout classification Analysis:
# cvFraction : 0.8
# Test Accuracy 0.65
# *Legend:
# cvFraction = Fraction of data to keep for training data
# Test Accuracy = Accuracy from the Testing dataset
# Alternate uses:
# perform a k-folds cross-validated classification analysis:
Results <- classifyFun(Data = KinData,classCol = 1,
selectedCols = c(1,2,12,22,32,42,52,62,72,82,92,102,112),cvType = "folds")
# use extendedResults as well as tuning
Results <- classifyFun(Data = KinData,classCol = 1,
selectedCols = c(1,2,12,22,32,42,52,62,72,82,92,102,112),
cvType = "folds",extendedResults = TRUE,tune=TRUE)