sampleDBMS {DMwR2}R Documentation

Drawing a random sample of records of a table stored in a DBMS

Description

Function for obtaining a random sample of records from a very large table stored in a databased managment system, whitout having to load in the full table into memory. Targets situations where the full data does not fit in the computer memory so usage of the standard sample function is not possible.

Usage

sampleDBMS(dbConn, tbl, percORn, mxPerc=0.5)

Arguments

dbConn

A data based connection object from the DBI package, that contains the result of establishing the connection to your target database in the respective database managment system.

tbl

A string containing the name of the (large) table in the database from which you want draw a random sample of records.

percORn

Either the percentage of number of rows of the file or the actual number of rows, the sample should have

mxPerc

A maximum threshold for the percentage the sample is allowed to have (defaults to 0.5)

Details

This function can be used to draw a random sample of records from a very large table of a database managment system. This is particularly usefull when you can not afford to load the full table into memory to use R functions like sample to obtain the sample.

The function obtains the sample of rows without actually loading the full data into memory - only the final sample is loaded into main memory.

The function assumes you have alread established and opened a connection to the database and receives as argument the DBI connection object.

Value

A data frame

Author(s)

Luis Torgo ltorgo@dcc.fc.up.pt

References

Torgo, L. (2016) Data Mining using R: learning with case studies, second edition, Chapman & Hall/CRC (ISBN-13: 978-1482234893).

http://ltorgo.github.io/DMwR2

See Also

sampleCSV, sample

Examples

## A simple example over a table on a MySQL database
## Not run: 
library(DBI)
library(RMySQL)
drv <- dbDriver("MySQL")  # Loading the MySQL driver
con <- dbConnect(drv,dbname="myDB",  
                 username="myUSER",password="myPASS",
                 host="localhost")
d <- sampleDBMS(con,"largeTable",10000)

## End(Not run)

[Package DMwR2 version 0.0.2 Index]