breast_cancer_clean_features {ascentTraining} | R Documentation |
Wisconsin Breast Cancer Database
Description
Wisconsin Breast Cancer Database
Usage
breast_cancer_clean_features
Format
A list containing a training and test dataset. These come from a data frame with 699 observations on 11 variables, however the ID and class columns have been removed. There is a train to test ratio of 0.8.
Cl.thickness
Clump Thickness
Cell.size
Uniformity of Cell Size
Cell.shape
Uniformity of Cell Shape
Marg.adhesion
Marginal Adhesion
Epith.c.size
Single Epithelial Cell Size
Bare.nuclei
Bare Nuclei
Bl.cromatin
Bland Chromatin
Normal.nucleoli
Normal Nucleoli
Mitoses
Mitoses
Source
Creator: Dr. WIlliam H. Wolberg (physician); University of Wisconsin Hospital ;Madison; Wisconsin; USA
Donor: Olvi Mangasarian (mangasarian@cs.wisc.edu)
Received: David W. Aha (aha@cs.jhu.edu)
These data have been taken from the UCI Repository Of Machine Learning Databases at
and were converted to R format by Evgenia Dimitriadou.
References
1. Wolberg,W.H., \& Mangasarian,O.L. (1990). Multisurface method
of pattern separation for medical diagnosis applied to breast cytology. In
Proceedings of the National Academy of Sciences, 87, 9193-9196.
- Size of
data set: only 369 instances (at that point in time)
- Collected
classification results: 1 trial only
- Two pairs of parallel hyperplanes
were found to be consistent with 50% of the data
- Accuracy on remaining
50% of dataset: 93.5%
- Three pairs of parallel hyperplanes were found
to be consistent with 67% of data
- Accuracy on remaining 33% of
dataset: 95.9%
2. Zhang,J. (1992). Selecting typical instances in instance-based learning.
In Proceedings of the Ninth International Machine Learning Conference (pp.
470-479). Aberdeen, Scotland: Morgan Kaufmann.
- Size of data set: only
369 instances (at that point in time)
- Applied 4 instance-based learning
algorithms
- Collected classification results averaged over 10 trials
- Best accuracy result:
- 1-nearest neighbor: 93.7%
- trained on 200
instances, tested on the other 169
- Also of interest:
- Using only
typical instances: 92.2% (storing only 23.1 instances)
- trained on 200
instances, tested on the other 169
Newman, D.J. & Hettich, S. & Blake, C.L. & Merz, C.J. (1998). UCI Repository of machine learning databases [http://www.ics.uci.edu/~mlearn/MLRepository.html]. Irvine, CA: University of California, Department of Information and Computer Science.