peek {kutils}R Documentation

Show variables, one at a time, QUICKLY and EASILY.

Description

This makes it easy to quickly scan through all of the columns in a data frame to spot unexpected patterns or data entry errors. Numeric variables are depicted as histograms, while factor and character variables are summarized by the R table function and then presented as barplots. This is most useful with a large screen graphic device (try running the function provided with this package, dev.create(height=7, width=7)) or any other method you prefer to create a large device.

Usage

peek(
  dat,
  sort = TRUE,
  file = NULL,
  textout = FALSE,
  ask,
  ...,
  xlabstub = "kutils peek: ",
  freq = FALSE,
  histargs = list(probability = !freq),
  barargs = list(horiz = TRUE, las = 1)
)

Arguments

dat

An R data frame or something that can be coerced to a data frame by as.data.frame

sort

Default TRUE. Do you want display of the columns in alphabetical order?

file

Should output go in file rather than to the screen. Default is NULL, meaning show on screen. If you supply a file name, we will write PDF output into it.

textout

If TRUE, counts from histogram bins and tables will appear in the console.

ask

As in the old style R par(ask = TRUE): should keyboard interaction advance to the next plot. Will default to false if the file argument is non-null. If file is null, setting ask = FALSE will cause graphs to whir bye without pausing.

...

Additional arguments for the pdf, histogram, table, or barplot functions. Please see Details below.

xlabstub

A text stub that will appear in the x axis label. Currently it includes advertising for this package.

freq

As in the histogram frequency argument. Should graphs show counts (freq = TRUE) or proportions (AKA densities) (freq = FALSE)

histargs

A list of arguments to be passed to the hist function.

barargs

A list of arguments to be passed to the barplot function.

Value

A vector of column names that were plotted

Try the Defaults

Every effort has been made to make this simple and easy to use. Please run the examples as they are before becoming too concerned about customization. This function is intended for getting a quick look at each variable, one-by-one, it is not intended to create publication quality histograms. For sake of the fastidious users, a lot of settings can be adjusted. Users can control the parameters for presentation of histograms (parameters for hist) and barplots (parameters for barplot). The function also can create frequency tables (which users can control by providing additional named arguments).

Style

The histograms are standard, upright histograms. The barplots are horizontal. I chose to make the bars horizontal because long value labels are more easily accomodated on the left axis. The code measures the length (in inches) for strings and the margin is increased accordingly. The examples have a demonstration of that effect.

Dealing with Dots

additional named arguments, ..., are inspected and sorted into groups intended to control use of R functions hist, barplot, table and pdf.

The parameters c("exclude", "dnn", "useNA", "deparse.level") and will go to the table function, which is used to make barplots for factor and character variables. These named arguments are extracted and sent to the pdf function: c("width", "height", "onefile", "family", "title", "fonts", "version", "paper", "encoding", "bg", "fg", "pointsize", "pagecentre", "colormodel", "useDingbats", "useKerning", "fillOddEven", "compress"). Any other arguments that are unique to hist or barplot are sorted out and sent only to those functions.

Any other arguments, including graphical parameters will be sent to both the histogram and barplot functions, so it is a convenient way to obtain uniform appearance. Additional arguments that are common to barplot and hist will work, and so will any graphics parameters (named arguments of par, for example). However, if one wants to target some arguments to hist, but not barplot, then the histargs list argument should be used. Similarly, barargs should be used to send argument to the barplot function. Warning: the defaults for histargs and barargs include some settings that are needed for the existing design. If new lists for histargs or barargs are supplied, the previously specified defaults are lost. Hence, users should include the existing members of those lists, possibly with revised values.

All of this argument sorting effort is done in order to reduce a prolific number of warnings that were observed in previous editions of this function.

Author(s)

Paul Johnson <pauljohn@ku.edu>

Examples


set.seed(234234)
N <- 200
mydf <- data.frame(x5 = rnorm(N), x4 = rnorm(N), x3 = rnorm(N),
                   x2 = letters[sample(1:24, 200, replace = TRUE)],
                   x1 = factor(sample(c("cindy", "bobby", "marsha",
                                        "greg", "chris"), 200, replace = TRUE)),
                   stringsAsFactors = FALSE)
## Insert 16 missings
mydf$x1[sample(1:150, 16,)] <- NA
mydf$adate <- as.Date(c("1jan1960", "2jan1960", "31mar1960", "30jul1960"), format = "%d%b%y")
peek(mydf)
peek(mydf, sort = FALSE)
## Demonstrate the dot-dot-dot usage to pass in hist params
peek(mydf, breaks = 30, ylab = "These are Counts, not Densities", freq = TRUE)
## Not Run: file output
## peek(mydf, sort = FALSE, file = "three_histograms.pdf")
## Use some objects from the datasets package
library(datasets)
peek(cars, xlabstub = "R cars data: ")
peek(EuStockMarkets, xlabstub = "Euro Market Data: ")
peek(EuStockMarkets, xlabstub = "Euro Market Data: ", breaks = 50,
     freq = TRUE)
## Not run: file output
## peek(EuStockMarkets, breaks = 50, file = "myeuro.pdf",
##      height = 4, width=3, family = "Times")
## peek(EuStockMarkets, breaks = 50, file = "myeuro-%d3.pdf",
##      onefile = FALSE, family = "Times", textout = TRUE)
## xlab goes into "..." and affects both histograms and barplots
peek(mydf, breaks = 30, ylab = "These are Counts, not Densities",
    freq = TRUE)
## xlab is added in the barargs list.
peek(mydf, breaks = 30, ylab = "These are Counts, not Densities",
    freq = TRUE, barargs = list(horiz = TRUE, las = 1, xlab = "I'm in barargs"))
peek(mydf, breaks = 30, ylab = "These are Counts, not Densities", freq = TRUE,
     barargs = list(horiz = TRUE, las = 1, xlim = c(0, 100),
     xlab = "I'm in barargs, not in histargs"))
levels(mydf$x1) <- c(levels(mydf$x1), "arthur philpot smythe")
mydf$x1[4] <- "arthur philpot smythe"
mydf$x2[1] <- "I forgot what letter"
peek(mydf, breaks = 30,
     barargs = list(horiz = TRUE, las = 1))


[Package kutils version 1.73 Index]