wdbc {mclust} | R Documentation |
UCI Wisconsin Diagnostic Breast Cancer Data
Description
The data set provides data for 569 patients on 30 features of the cell nuclei obtained from a digitized image of a fine needle aspirate (FNA) of a breast mass. For each patient the cancer was diagnosed as malignant or benign.
Usage
data(wdbc)
Format
A data frame with 569 observations on the following variables:
ID
ID number
Diagnosis
cancer diagnosis:
M
= malignant,B
= benignRadius_mean
a numeric vector
Texture_mean
a numeric vector
Perimeter_mean
a numeric vector
Area_mean
a numeric vector
Smoothness_mean
a numeric vector
Compactness_mean
a numeric vector
Concavity_mean
a numeric vector
Nconcave_mean
a numeric vector
Symmetry_mean
a numeric vector
Fractaldim_mean
a numeric vector
Radius_se
a numeric vector
Texture_se
a numeric vector
Perimeter_se
a numeric vector
Area_se
a numeric vector
Smoothness_se
a numeric vector
Compactness_se
a numeric vector
Concavity_se
a numeric vector
Nconcave_se
a numeric vector
Symmetry_se
a numeric vector
Fractaldim_se
a numeric vector
Radius_extreme
a numeric vector
Texture_extreme
a numeric vector
Perimeter_extreme
a numeric vector
Area_extreme
a numeric vector
Smoothness_extreme
a numeric vector
Compactness_extreme
a numeric vector
Concavity_extreme
a numeric vector
Nconcave_extreme
a numeric vector
Symmetry_extreme
a numeric vector
Fractaldim_extreme
a numeric vector
Details
The recorded features are:
-
Radius
as mean of distances from center to points on the perimeter -
Texture
as standard deviation of gray-scale values -
Perimeter
as cell nucleus perimeter -
Area
as cell nucleus area -
Smoothness
as local variation in radius lengths -
Compactness
as cell nucleus compactness, perimeter^2 / area - 1 -
Concavity
as severity of concave portions of the contour -
Nconcave
as number of concave portions of the contour -
Symmetry
as cell nucleus shape -
Fractaldim
as fractal dimension, "coastline approximation" - 1
For each feature the recorded values are computed from each image as <feature_name>_mean
, <feature_name>_se
, and <feature_name>_extreme
, for the mean, the standard error, and the mean of the three largest values.
Source
The Breast Cancer Wisconsin (Diagnostic) Data Set (wdbc.data
, wdbc.names
) from the UCI Machine Learning Repository
https://archive.ics.uci.edu/ml/datasets/Breast+Cancer+Wisconsin+(Diagnostic). Please note the UCI conditions of use.
References
Mangasarian, O. L., Street, W. N., and Wolberg, W. H. (1995) Breast cancer diagnosis and prognosis via linear programming. Operations Research, 43(4), pp. 570-577.