clindata_miss {missCompare} | R Documentation |
Clinical dataset with missingness
Description
clindata_miss
is a custom made dataframe that resembles a real-life clinical dataset.
The correlations between variables, the data means, SDs and ranges are realistic, but
the dataset is constructed by simulations and manual data input. The dataset contains
missing values (approximately 10% missing overall), and values are missing in a realistic pattern.
Usage
clindata_miss
Format
A data frame with 2500 rows and 12 variables:
- age
numeric, age, in years, 2.88% missing - in general, age is not likely have lots of missing data in a realistic dataset, therefore only a few values are missing here randomly, e.g. due to mistakes in data input
- sex
factor, male=1 and female=2, 2.88% missing - similar to age, sex information is also not likely have missing data in a realistic dataset, no values are missing here
- waist
numeric, waist circumference, in cm, 4.12% missing - anthropometric data is easy to collect, therefore only a small fraction is missing here, often missing together with BMI, the other anthropometric variable
- BMI
numeric, body mass index, in kg/m2, 4.16% missing - anthropometric data is easy to collect, therefore only a small fraction is missing here, often missing together with waist, the other anthropometric variable
- SBP
numeric, systolic blood pressure, in mmHg, 8.84% missing - in a realistic fashion, SBP is almost always missing together with DBP
- DBP
numeric, diastolic blood pressure, in mmHg, 8.84% missing - in a realistic fashion, DBP is almost always missing together with SBP
- FG
numeric, blood fasting glucose concentration, in mmol/dl, 5.84% missing - often missing together with other clinical variables
- PPG
numeric, blood postprandial glucose concentration, in mmol/dl, 53.2% missing - in this simulated dataset, only less than half of the participants had postprandial glucose measurements
- TC
numeric, blood total cholesterol concentration, in mmol/dl, 7.2% missing - often missing together with other lipids, TG and HDL-C
- TG
numeric, blood triglycerides concentration, in mmol/dl, 7.48% missing - often missing together with other lipids, TC and HDL-C, due to the sensitivity of a hypothetical machine, values below 0.6 are set to -9, upon conversion from -9s to NAs, the missingness fraction is 10.6%
- HDL
numeric, blood high density lipoprotein cholesterol concentration, in mmol/dl, 10.76% missing - often missing together with other lipids, TG and TC, due to the sensitivity of a hypothetical machine, values below 0.05 are set to -9, upon conversion from -9s to NAs, the missingness fraction is 13.72%
- education
factor, primary school=1, secondary school=2, bsc degree=3, msc degree=4, phd degree=5, 7.16% missing - self reported education missing in a not random fashion, those with lower education are less likely to report their education status
Source
The dataset is simulated and undergone manual configuration.