Data for cleaning {epiDisplay}R Documentation

Dataset for practicing cleaning, labelling and recoding

Description

The data come from clients of a family planning clinic.

For all variables except id: 9, 99, 99.9, 888, 999 represent missing values

Usage

data(Planning)

Format

A data frame with 251 observations on the following 11 variables.

ID

a numeric vector: ID code

AGE

a numeric vector

RELIG

a numeric vector: Religion

1 = Buddhist
2 = Muslim
PED

a numeric vector: Patient's education level

1 = none
2 = primary school
3 = secondary school
4 = high school
5 = vocational school
6 = university
7 = other
INCOME

a numeric vector: Monthly income in Thai Baht

1 = nil
2 = < 1,000
3 = 1,000-4,999
4 = 5,000-9,999
5 = 10,000
AM

a numeric vector: Age at marriage

REASON

a numeric vector: Reason for family planning

1 = birth spacing
2 = enough children
3 = other
BPS

a numeric vector: systolic blood pressure

BPD

a numeric vector: diastolic blood pressure

WT

a numeric vector: weight (Kg)

HT

a numeric vector: height (cm)

Examples

data(Planning)
des(Planning)

# Change var. name to lowercase
names(Planning) <- tolower(names(Planning)) 
.data <- Planning
des(.data)
# Check for duplication of 'id'
attach(.data)
any(duplicated(id))
duplicated(id)
id[duplicated(id)] #215

# Which one(s) are missing?
setdiff(min(id):max(id), id) # 216

# Correct the wrong on
id[duplicated(id)] <- 216
detach(.data)
rm(list=ls())

[Package epiDisplay version 3.5.0.1 Index]