cancer.df {multimix} | R Documentation |
Prostate cancer patient data
Description
Data on 475 prostate cancer patients
Usage
data(cancer.df)
Format
A data.frame with 475 rows and 12 columns:
- age
Age in years
- wt
Weight in pounds
- pf
Patient activity
- hx
Family history of cancer
- sbp
Systolic blood pressure
- dbp
Diastolic blood pressure
- ekg
Electrocardiogram code
- hg
Serum haemoglobin
- sz
Size of primary tumour
- sg
Index of tumour stage and histolic grade
- ap
Serum prostatic acid phosphatase
- bm
Bone metastatses
Details
There are twelve pre-trial covariates measured on each patient, seven may be taken to be continuous, four to be discrete, and one variable (SG) is an index nearly all of whose values lie between 7 and 15, and which could be considered either discrete or continuous. We will treat SG as a continuous variable.
A preliminary inspection of the data showed that the sizeof the primary tumour (SZ) and serum prostatic acid phosphatase (AP) were both skewed variables. These variables have therefore been transformed. A square root transformation was used for SZ, and a logarithmic transformation was used for AP to achieve approximate normality. (As for correlation, skewness over the whole data set does not necessarily mean skewness within clusters. But when clusters were formed, within-cluster skewness was observed for these variables.)
Observations that had missing values in any of the twelve pretreatment covariates were omitted from furtheranalysis, leaving 475 out of the original 506 observations available.
The categorical variable Patient activity
had 4 levels: 'Normally
Active', 'Bed rest below 50
or more', and 'Confined to bed'. The numbers of the 475 in these groups were
428, 32, 12, and 3. The least active two groups are grouped in our data,
giving 3 groups of size 428, 32, and 15.
Source
D.P. Byar and S.B. Green 'The choice of treatment for cancer patients based on covariate information - application to prostate cancer', Bulletin du Cancer 1980: 67:477–490, reproduced in D.A. Andrews and A.M. Herzberg 'Data: a collection of problems from many fields for the student and research worker' p.261–274 Springer series in statistics, Springer-Verlag. New York.