classifyFun {PredPsych}R Documentation

Generic Classification Analyses

Description

function for performing generic classification Analysis

Usage

classifyFun(Data, classCol, selectedCols, cvType, ntrainTestFolds,
  nTrainFolds, modelTrainFolds, nTuneFolds, tuneFolds, foldSep, cvFraction,
  ranges = NULL, tune = FALSE, cost = 1, gamma = 0.5,
  classifierName = "svm", genclassifier, silent = FALSE,
  extendedResults = FALSE, SetSeed = TRUE, NewData = NULL, ...)

Arguments

Data

(dataframe) dataframe of the data

classCol

(numeric or string) column number that contains the variable to be predicted

selectedCols

(optional) (numeric or string) all the columns of data that would be used either as predictor or as feature

cvType

(optional) (string) which type of cross-validation scheme to follow; One of the following values:

  • folds = (default) k-fold cross-validation

  • LOSO = Leave-one-subject-out cross-validation

  • holdout = holdout Crossvalidation. Only a portion of data (cvFraction) is used for training.

  • LOTO = Leave-one-trial out cross-validation.

ntrainTestFolds

(optional) (parameter for only k-fold cross-validation) No. of folds for training and testing dataset

nTrainFolds

(optional) (parameter for only k-fold cross-validation) No. of folds in which to further divide Training dataset

modelTrainFolds

= (optional) (parameter for only k-fold cross-validation) specific folds from the first train/test split (ntrainTestFolds) to use for training

nTuneFolds

(optional) (parameter for only k-fold cross-validation) No. of folds for Tuning

tuneFolds

(optional) (parameter for only k-fold cross-validation) specific folds from the above nTuneFolds to use for tuning

foldSep

(numeric) (parameter for only Leave-One_subject Out) mandatory column number for Leave-one-subject out cross-validation.

cvFraction

(optional) (numeric) Fraction of data to keep for training data

ranges

(optional) (list) ranges for tuning support vector machine

tune

(optional) (logical) whether tuning of svm parameters should be performed or not

cost

(optional) (numeric) regularization parameter of svm

gamma

(optional) (numeric) rbf kernel parameter

classifierName

(optional) (string) name of the classifier to be used

genclassifier

(optional) (function or string) a classifier function or a name (e.g. Classifier.svm)

silent

(optional) (logical) whether to print messages or not

extendedResults

(optional) (logical) Return extended results with model and other metrics

SetSeed

(optional) (logical) Whether to setseed or not. use SetSeed to seed the random number generator to get consistent results; set false only for permutation tests

NewData

(optional) (dataframe) New Data frame features for which the class membership is requested

...

(optional) additional arguments for the function

Details

This function implements Classification Analysis. Classification Analysis is a supervised machine learning approach that attempts to identify holistic patters in the data and assign to it classes (classification). Given a set of features, a classification analysis automatically learns intrinsic patterns in the data to be able to predict respective classes. If the data features are informative about the classes, a high classification score would be achieved.

Value

Depending upon extendedResults. extendedResults = FALSE outputs Test accuracy accTest of discrimination; extendedResults = TRUE outputs Test accuracy accTest of discrimination, accTestRun discrimination for each run in case of cvType as LOSO,LOTO or Folds ConfMatrix Confusion matrices and classificationResults list of the cross-validation results including the model and ConfusionMatrixResults Overall cross-validated confusion matrix results

Author(s)

Atesh Koul, C'MON unit, Istituto Italiano di Tecnologia

atesh.koul@gmail.com

References

Duda, R. O., Hart, P. E., & Stork, D. G. (2000). Pattern Classification. Wiley-Interscience (Vol. 24).

Vapnik, V. (1995). The Nature of statistical Learning Theory. Springer-Verlag New York.

Hsu, C. C., Chang, C. C., & Lin, C. C. (2003). A practical guide to support vector classification, 1(1), 1-16.

Examples

# classification analysis with SVM
Results <- classifyFun(Data = KinData,classCol = 1,
selectedCols = c(1,2,12,22,32,42,52,62,72,82,92,102,112),cvType="holdout")

# Output:

# Performing Classification Analysis
#
# Performing holdout Cross-validation
# genclassifier was not specified, 
#   Using default value of Classifier.svm (genclassifier = Classifier.svm)"
# 
# cvFraction was not specified, 
#  Using default value of 0.8 (cvFraction = 0.8)
# 
# Proportion of Test/Train Data was :  0.2470588 
# [1] "Test holdout Accuracy is  0.65"
# holdout classification Analysis: 
# cvFraction : 0.8 
# Test Accuracy 0.65
# *Legend:
# cvFraction = Fraction of data to keep for training data 
# Test Accuracy = Accuracy from the Testing dataset

# Alternate uses:
# perform a k-folds cross-validated classification analysis:
Results <- classifyFun(Data = KinData,classCol = 1,
selectedCols = c(1,2,12,22,32,42,52,62,72,82,92,102,112),cvType = "folds")

# use extendedResults as well as tuning
Results <- classifyFun(Data = KinData,classCol = 1,
selectedCols = c(1,2,12,22,32,42,52,62,72,82,92,102,112),
cvType = "folds",extendedResults = TRUE,tune=TRUE)




[Package PredPsych version 0.4 Index]