R: Letter Image Recognition Data

LetterRecognition {mlbench}

R Documentation

Letter Image Recognition Data

Description

The objective is to identify each of a large number of black-and-white rectangular pixel displays as one of the 26 capital letters in the English alphabet. The character images were based on 20 different fonts and each letter within these 20 fonts was randomly distorted to produce a file of 20,000 unique stimuli. Each stimulus was converted into 16 primitive numerical attributes (statistical moments and edge counts) which were then scaled to fit into a range of integer values from 0 through 15. We typically train on the first 16000 items and then use the resulting model to predict the letter category for the remaining 4000. See the article cited below for more details.

Usage

data("LetterRecognition", package = "mlbench")

Format

A data frame with 20,000 observations on 17 variables, the first is a factor with levels A-Z, the remaining 16 are numeric.

[,1]	lettr	capital letter
[,2]	x.box	horizontal position of box
[,3]	y.box	vertical position of box
[,4]	width	width of box
[,5]	high	height of box
[,6]	onpix	total number of on pixels
[,7]	x.bar	mean x of on pixels in box
[,8]	y.bar	mean y of on pixels in box
[,9]	x2bar	mean x variance
[,10]	y2bar	mean y variance
[,11]	xybar	mean x y correlation
[,12]	x2ybr	mean of `x^2 y`
[,13]	xy2br	mean of `x y^2`
[,14]	x.ege	mean edge count left to right
[,15]	xegvy	correlation of x.ege with y
[,16]	y.ege	mean edge count bottom to top
[,17]	yegvx	correlation of y.ege with x

Source

Creator: David J. Slate
Odesta Corporation; 1890 Maple Ave; Suite 115; Evanston, IL 60201
Donor: David J. Slate (dave@math.nwu.edu) (708) 491-3867

These data have been taken from the UCI Repository Of Machine Learning Databases (Blake & Merz 1998) and were converted to R format by Friedrich Leisch in the late 1990s.

The current version of the UC Irvine Machine Learning Repository Letter Recognition data set is available from doi:10.24432/C5ZP40.

References

P. W. Frey and D. J. Slate (Machine Learning Vol 6/2 March 91): "Letter Recognition Using Holland-style Adaptive Classifiers".

The research for this article investigated the ability of several variations of Holland-style adaptive classifier systems to learn to correctly guess the letter categories associated with vectors of 16 integer attributes extracted from raster scan images of the letters. The best accuracy obtained was a little over 80%. It would be interesting to see how well other methods do with the same data.

Blake, C.L. & Merz, C.J. (1998). UCI Repository of Machine Learning Databases. Irvine, CA: University of California, Irvine, Department of Information and Computer Science. Formerly available from ‘⁠http://www.ics.uci.edu/~mlearn/MLRepository.html⁠’.

Examples

data("LetterRecognition", package = "mlbench")
summary(LetterRecognition)

[Package mlbench version 2.1-5 Index]