data_VOC {cellWise}R Documentation

VOC dataset

Description

This dataset contains the data on volatile organic components (VOCs) in urine of children between 3 and 10 years old. It is composed of pubicly available data from the National Health and Nutrition Examination Survey (NHANES) and was analyzed in Raymaekers and Rousseeuw (2020). See below for details and references.

Usage

data("data_VOC")

Format

A matrix of dimensions 512 \times 19. The first 16 variables are the VOC, the last 3 are:

Note that the original variable names are kept.

Details

All of the data was collected from the NHANES website, and was part of the NHANES 2015-2016 survey. This was the most recent epoch with complete data at the time of extraction. Three datasets were matched in order to assemble this data:

The dataset was constructed as follows:

  1. Select the relevant VOCs from the UVOC_I data (see column names) and transform by taking the logarithm

  2. Match the subjects in the UVOC_I data with their age in the DEMO_I data

  3. Select all subjects with age at most 10

  4. Match the data on smoking habits with the selected subjects.

Source

https://wwwn.cdc.gov/nchs/nhanes/Search/DataPage.aspx?Component=Laboratory&CycleBeginYear=2015

https://wwwn.cdc.gov/nchs/nhanes/search/datapage.aspx?Component=Demographics&CycleBeginYear=2015

https://wwwn.cdc.gov/nchs/nhanes/Search/DataPage.aspx?Component=Questionnaire&CycleBeginYear=2015

References

J. Raymaekers and P.J. Rousseeuw (2020). Handling cellwise outliers by sparse regression and robust covariance. Arxiv: 1912.12446. (link to open access pdf)

Examples

data("data_VOC")
# For an analysis of this data, we refer to the vignette:
vignette("DI_examples")

[Package cellWise version 2.2.5 Index]