data_VOC {cellWise} | R Documentation |
VOC dataset
Description
This dataset contains the data on volatile organic components (VOCs) in urine of children between 3 and 10 years old. It is composed of pubicly available data from the National Health and Nutrition Examination Survey (NHANES) and was analyzed in Raymaekers and Rousseeuw (2020). See below for details and references.
Usage
data("data_VOC")
Format
A matrix of dimensions 512 \times 19
.
The first 16 variables are the VOC, the last 3 are:
-
SMD460
: number of smokers that live in the same home as the subject -
SMD470
: number of people that smoke inside the home of the subject -
RIDAGEYR
: age of the subject
Note that the original variable names are kept.
Details
All of the data was collected from the NHANES website, and was part of the NHANES 2015-2016 survey. This was the most recent epoch with complete data at the time of extraction. Three datasets were matched in order to assemble this data:
UVOC_I: contains the information on the Volative organic components in urine
DEMO_I: contains the demographical information such as age
SMQFAM_I: contains the data on the smoking habits of family members
The dataset was constructed as follows:
Select the relevant VOCs from the UVOC_I data (see column names) and transform by taking the logarithm
Match the subjects in the UVOC_I data with their age in the DEMO_I data
Select all subjects with age at most 10
Match the data on smoking habits with the selected subjects.
Source
https://wwwn.cdc.gov/nchs/nhanes/Search/DataPage.aspx?Component=Laboratory&CycleBeginYear=2015
https://wwwn.cdc.gov/nchs/nhanes/search/datapage.aspx?Component=Demographics&CycleBeginYear=2015
https://wwwn.cdc.gov/nchs/nhanes/Search/DataPage.aspx?Component=Questionnaire&CycleBeginYear=2015
References
J. Raymaekers and P.J. Rousseeuw (2020). Handling cellwise outliers by sparse regression and robust covariance. Journal of Data Science, Statistics, and Visualisation. doi:10.52933/jdssv.v1i3.18(link to open access pdf)
Examples
data("data_VOC")
# For an analysis of this data, we refer to the vignette:
## Not run:
vignette("DI_examples")
## End(Not run)