read_mnist {dslabs}R Documentation

Download and read the mnist dataset

Description

This function downloads the mnist training and test data available here http://yann.lecun.com/exdb/mnist/

Usage

read_mnist(
  path = NULL,
  download = FALSE,
  destdir = tempdir(),
  url = "https://www2.harvardx.harvard.edu/courses/IDS_08_v2_03/",
  keep.files = TRUE
)

Arguments

path

A character giving the full path of the directory to look for files. It assumes the filenames are the same as the originals. If path is NULL a download or direct read of the files is attempted.

download

If TRUE the files will be downloaded and saved in detsdir.

destdir

A character giving the full path of the directory in which to save the downloaded files. The default is to use a temporary directory.

url

A character giving the URL from which to download files. Currently a copy of the data is available at https://www2.harvardx.harvard.edu/courses/IDS_08_v2_03/, the current default URL.

keep.files

A logical. If TRUE the downloaded files will be saved in destdir. If FALSE the entire directory is erased. This argument is ignored if download is FALSE.

Value

A list with two components: train and test. Each of these is a list with two components: images and labels. The images component is a matrix with each column representing one of the 28*28 = 784 pixels. The values are integers between 0 and 255 representing grey scale. The labels components is a vector representing the digit shown in the image.

Note that the data is over 10MB, so the download may take several seconds depending on internet speed. If you plan to load the data more than once we recommend you download the data once and read it from disk in the future. See examples.

Author(s)

Samuela Pollack

Rafael A. Irizarry, rafael_irizarry@dfci.harvard.edu

References

Y. LeCun, L. Bottou, Y. Bengio, and P. Haffner. "Gradient-based learning applied to document recognition." Proceedings of the IEEE, 86(11):2278-2324, November 1998.

Examples

# this can take several seconds, depending on internet speed.

## Not run: 
mnist <- read_mnist()
i <- 5
image(1:28, 1:28, matrix(mnist$test$images[i,], nrow=28)[ , 28:1], 
    col = gray(seq(0, 1, 0.05)), xlab = "", ylab="")
## the labels for this image is: 
mnist$test$labels[i]

## End(Not run)

# You can download and save the data to a directory like this:
## Not run: 
mnist <- read_mnist(download = TRUE, destdir = "~/Downloads")

# and then, going forward, read from disk 
mnist <- read_mnist("~/Downloads")

## End(Not run)

[Package dslabs version 0.8.0 Index]