ZIClass {zooimage} | R Documentation |
Create and manipulate 'ZIClass' objects
Description
'ZIClass' objects are key items in ZooImage. They contain all what is required for automatically classify plancton from .zid files. They can be used as blackboxes by all users (but require users trained in machine learning techniques to build them). Hence, ZooImage is made very simple for biologists that just want to use classifiers but do not want to worry about all the complexities of what is done inside the engine!
Usage
ZIClass(formula, data, method = getOption("ZI.mlearning", "mlRforest"),
calc.vars = getOption("ZI.calcVars", calcVars), drop.vars = NULL,
drop.vars.def = dropVars(), cv.k = 10, cv.strat = TRUE,
..., subset, na.action = na.omit)
## S3 method for class 'ZIClass'
print(x, ...)
## S3 method for class 'ZIClass'
summary(object, sort.by = "Fscore", decreasing = TRUE,
na.rm = FALSE, ...)
## S3 method for class 'ZIClass'
predict(object, newdata, calc = TRUE, class.only = TRUE,
type = "class", ...)
## S3 method for class 'ZIClass'
confusion(x, y = response(x), labels = c("Actual", "Predicted"),
useNA = "ifany", prior, use.cv = TRUE, ...)
Arguments
formula |
a formula with left member being the class variable and the
right member being a list of predicting variables separated by a '+' sign.
Since |
data |
a data frame (a 'ZITrain' object usually), containing both measurement and manual classification (a factor variables usually named 'Class'). |
method |
the machine learning method to use. It should produce
results compatible with |
calc.vars |
a function to use to calculate variables from the original data frame. |
drop.vars |
a character vector with names of variables to drop for the
classification, or |
drop.vars.def |
a second list of variables to drop contained in a
character vector. That list is supposed to match the name of variables that
are obviously non informative and are dropped by default. It can be gathered
automatically using |
cv.k |
the k times for cross-validation. |
cv.strat |
do we use a stratified sampling for cross-validation? (recommended). |
... |
further arguments to pass to the classification algorithm (see help of that particular function). |
subset |
an expression for subsetting to original data frame. |
na.action |
the function to filter the initial data frame for missing
values. Althoung the default in R is |
x |
a 'ZIClass' object. |
object |
a 'ZIClass' object. |
newdata |
a 'ZIDat' object, or a 'data.frame' to use for prediction. |
sort.by |
the statistics to use to sort the table (by default, F-score). |
decreasing |
do we sort in increasing or decreasing order? |
na.rm |
do we eliminate entries with missing data first (using
|
calc |
a boolean indicating if variables have to be recalculated before running the prediction. |
class.only |
if TRUE, return just a vector with classification, otherwise, return the 'ZIDat' object with 'Predicted' column appended to it. |
type |
the type of result to return, |
y |
a factor with reference classes. |
labels |
labels to use for, respectively, the reference class and the predicted class. |
useNA |
do we keep NAs as a separate category? The default |
prior |
class frequencies to use for first classifier that
is tabulated in the rows of the confusion matrix. This is either a single
positive numeric to set all class frequencies to this value (use 1 for
relative frequencies and 100 for relative freqs in percent), or a vector of
positive numbers of the same length as the levels in the object. If the
vector is named, names must match levels. Alternatively, providing
|
use.cv |
the predicted values extracted from the 'ZIClass' object can either be the predicted values from the training set, or the cross-validated predictions (by default). Most of the time, you want the cross-validated predictions, which allows for not (or less) biased evaluation of the classifier prediction... So, if you don't know, you are probably better leaving the default value. |
Value
ZIClass()
is the constructor that build the 'ZIClass' object.
print()
, summary()
and predict())
are the methods to
print the object, to calculate statistics on this classifier based on the
confusion matrix and to predict groups for ZooImage samples, using one
'ZIClass' object.
Note
Always analyze carefully the properties, performances and limitations of a
'ZIClass' object before using it to classify objects of one series. For
instance, you can use confusion()
to compare two classifiers, or an
automatic classifier with a manual classification done by a taxonomists.
Always respect the limitations in the use of a 'ZIClass' object (for
instance, a classifier specific of one given series should not be used to
classify items in a different series)! It is a good practice to make a
report, documenting a 'ZIClass' object, together with the comments of
taxonomists that made the reference training set, and with details on the
analysis of the performances of the classifier.
Author(s)
Philippe Grosjean <Philippe.Grosjean@umons.ac.be>
See Also
Examples
##TODO...