GermanCredit {evtree} | R Documentation |
Statlog German Credit
Description
The dataset contains data of past credit applicants. The applicants are rated as good or bad. Models of this data can be used to determine if new applicants present a good or bad credit risk.
Usage
data("GermanCredit")
Format
A data frame containing 1,000 observations on 21 variables.
- status
factor variable indicating the status of the existing checking account, with levels
... < 0 DM
,0 <= ... < 200 DM
,... >= 200 DM/salary for at least 1 year
andno checking account
.- duration
duration in months.
- credit_history
factor variable indicating credit history, with levels
no credits taken/all credits paid back duly
,all credits at this bank paid back duly
,existing credits paid back duly till now
,delay in paying off in the past
andcritical account/other credits existing
.- purpose
factor variable indicating the credit's purpose, with levels
car (new)
,car (used)
,furniture/equipment
,radio/television
,domestic appliances
,repairs
,education
,retraining
,business
andothers
.- amount
credit amount.
- savings
factor. savings account/bonds, with levels
... < 100 DM
,100 <= ... < 500 DM
,500 <= ... < 1000 DM
,... >= 1000 DM
andunknown/no savings account
.- employment_duration
ordered factor indicating the duration of the current employment, with levels
unemployed
,... < 1 year
,1 <= ... < 4 years
,4 <= ... < 7 years
and... >= 7 years
.- installment_rate
installment rate in percentage of disposable income.
- personal_status_sex
factor variable indicating personal status and sex, with levels
male:divorced/separated
,female:divorced/separated/married
,male:single
,male:married/widowed
andfemale:single
.- other_debtors
factor. Other debtors, with levels
none
,co-applicant
andguarantor
.- present_residence
present residence since?
- property
factor variable indicating the client's highest valued property, with levels
real estate
,building society savings agreement/life insurance
,car or other
andunknown/no property
.- age
client's age.
- other_installment_plans
factor variable indicating other installment plans, with levels
bank
,stores
andnone
.- housing
factor variable indicating housing, with levels
rent
,own
andfor free
.- number_credits
number of existing credits at this bank.
- job
factor indicating employment status, with levels
unemployed/unskilled - non-resident
,unskilled - resident
,skilled employee/official
andmanagement/self-employed/highly qualified employee/officer
.- people_liable
Number of people being liable to provide maintenance.
- telephone
binary variable indicating if the customer has a registered telephone number.
- foreign_worker
binary variable indicating if the customer is a foreign worker.
- credit_risk
binary variable indicating credit risk, with levels
good
andbad
.
Details
The use of a cost matrix is suggested for this dataset. It is worse to class a customer as good when they are bad (cost = 5), than it is to class a customer as bad when they are good (cost = 1).
Source
The original data was provided by:
Professor Dr. Hans Hofmann, Institut fuer Statistik und Oekonometrie, Universitaet Hamburg, FB Wirtschaftswissenschaften, Von-Melle-Park 5, 2000 Hamburg 13
The dataset has been taken from the UCI Repository Of Machine Learning Databases at
http://archive.ics.uci.edu/ml/.
Examples
data("GermanCredit")
summary(GermanCredit)
## Not run:
gcw <- array(1, nrow(GermanCredit))
gcw[GermanCredit$credit_risk == "bad"] <- 5
suppressWarnings(RNGversion("3.5.0"))
set.seed(1090)
gct <- evtree(credit_risk ~ . , data = GermanCredit, weights = gcw)
gct
table(predict(gct), GermanCredit$credit_risk)
plot(gct)
## End(Not run)