caclassfit,caclasspred,vote,re_code {partools} | R Documentation |
Software Alchemy for Machine Learning
Description
Parallelization of machine learning algorithms.
Usage
caclassfit(cls,fitcmd)
caclasspred(fitobjs,newdata,yidx=NULL,...)
vote(preds)
re_code(x)
Arguments
cls |
A cluster run under the parallel package. |
fitcmd |
A string containing a model-fitting command to be run on each cluster node. This will typically include specification of the distributed data set. |
fitobjs |
An R list of objects returned by the |
newdata |
Data to be predicted from the fit computed by
|
yidx |
If provided, index of the true class values in
|
... |
Arguments to be passed to the underlying prediction
function for the given method, e.g. |
preds |
A vector of predicted classes, from which the "winner" will be selected by voting. |
x |
A vector of integers, in this context class codes. |
Details
This should work for almost any classification code that has a
“fit” function and a predict
method.
The method assumes i.i.d. data. If your data set had been stored in
some sorted order, it must be randomized first, say using the
scramble
option in distribsplit
or by calling
readnscramble
, depending on whether your data is already in
memory or still in a file.
It is assumed that class labels are 1,2,... If not, use
re_code
.
Value
The caclassfit
function returns an R list of objects as in
fitobjs
above.
The caclasspred
function returns an R list with these components:
-
predmat
, a matrix of predicted classes fornewdata
, one row per cluster node -
preds
, the final predicted classes, after usingvote
to resolve possible differences in predictions among nodes -
consensus
, the proportion of cases for which all nodes gave the same predictions (higher values indicating more stability) -
acc
, ifyidx
is non-NULL, the proportion of cases in whichpreds
is correct -
confusion
, ifyidx
is non-NULL, the confusion matrix
Author(s)
Norm Matloff
Examples
## Not run:
# set up 'parallel' cluster
cls <- makeCluster(2)
setclsinfo(cls)
# data prep
data(prgeng)
prgeng$occ <- re_code(prgeng$occ)
prgeng$bs <- as.integer(prgeng$educ == 13)
prgeng$ms <- as.integer(prgeng$educ == 14)
prgeng$phd <- as.integer(prgeng$educ == 15)
prgeng$sex <- prgeng$sex - 1
pe <- prgeng[,c(1,7,8,9,12,13,14,5)]
pe$occ <- as.factor(pe$occ) # needed for rpart!
# go
distribsplit(cls,'pe')
library(rpart)
clusterEvalQ(cls,library(rpart))
fit <- caclassfit(cls,"rpart(occ ~ .,data=pe)")
predout <- caclasspred(fit,pe,8,type='class')
predout$acc # 0.36
stopCluster(cls)
## End(Not run)