breast_cancer {ascentTraining} | R Documentation |
Wisconsin Diagnostic Breast Cancer (WDBC)
Description
The data contain measurements on cells in suspicious lumps in a women's breast. Features are computed from a digitised image of a fine needle aspirate (FNA) of a breast mass. They describe characteristics of the cell nuclei present in the image. All samples are classified as either benign or malignant.
Usage
breast_cancer
Format
breast_cancer
is a tibble
with 22 columns. The first column
is an ID column. The second indicates whether the sample is classified as benign or malignant.
The remaining columns contain measurements for 20 features. Ten real-valued features are computed
for each cell nucleus. The references listed below contain detailed descriptions of how these features
are computed. The mean, and "worst" (or largest - mean of the three largest values) of these features were computed
for each image, resulting in 20 features. Below are descriptions of these features where *
should be replaced by either mean
or worst
.
*_radius
mean of distances from center to points on the perimeter
*_texture
standard deviation of gray-scale values
*_perimeter
perimeter value
*_area
area value
*_smoothness
local variation in radius lengths
*_compactness
perimeter^2 / area - 1.0
*_concavity
severity of concave portions of the contour
*_concave_points
number of concave portions of the contour
*_symmetry
symmetry value
*_fractal_dimension
"coastline approximation" - 1
Note
This breast cancer database was obtained from the University of Wisconsin Hospitals, Madison from Dr. William H. Wolberg.
Source
https://archive.ics.uci.edu/ml/datasets/Breast+Cancer+Wisconsin+(Diagnostic)
Bache, K. & Lichman, M. (2013). UCI Machine Learning Repository.
Irvine, CA: University of California, School of Information and Computer
Science.
References
O. L. Mangasarian and W. H. Wolberg: "Cancer diagnosis via
linear programming",
SIAM News, Volume 23, Number 5, September 1990, pp 1
& 18. William H. Wolberg and O.L. Mangasarian: "Multisurface method of
pattern separation for medical diagnosis applied to breast cytology",
Proceedings of the National Academy of Sciences, U.S.A., Volume 87, December
1990, pp 9193-9196. K. P. Bennett & O. L. Mangasarian: "Robust linear
programming discrimination of two linearly inseparable sets",
Optimization Methods and Software 1, 1992, 23-34 (Gordon & Breach Science
Publishers).