census {conTree}R Documentation

Census Data Example from UC Irvine Machine Learning Repository

Description

Includes a data frame of 1994 US census income from 48,842 people divided into a training set of 32,561 and an independent test set of 16,281. The training outcome variable y (yt for test) is binary and indicates whether or not a person’s income is greater than $50,000 per year. There are 12 predictor variables x (xt for test) consisting of various demographic and financial properties associated with each person. It also included estimates of Pr(y=1|x) obtained by several machine learning methods: gradient boosting on logistic scale using maximum likelihood (GBL), random forest (RF), and gradient boosting on the probability scale (GBP) using least–squares.

Usage

census

Format

census

A list of 10 items.

x

training data frame of 32561 observations on 12 predictor variables

y

training binary response whether salary is above $50K or not

xt

test data frame of 16281 observations predictor variables

yt

test binary response whether salary is above $50K or not

gbl

training GBL response variable

gblt

test GBL response variable

gbp

training GBP response variable

gbpt

test GBP response variable

rf

training RF response probabilities

rft

test GBP response probabilities

Source

https://archive.ics.uci.edu/ml/datasets/census+income


[Package conTree version 0.3-1 Index]