peek {kutils} | R Documentation |
Show variables, one at a time, QUICKLY and EASILY.
Description
This makes it easy to quickly scan through all of the columns in a
data frame to spot unexpected patterns or data entry errors. Numeric variables are depicted as
histograms, while factor and character variables are summarized by
the R table function and then presented as barplots. This is most
useful with a large screen graphic device (try running the function
provided with this package, dev.create(height=7, width=7)
)
or any other method you prefer to create a large device.
Usage
peek(
dat,
sort = TRUE,
file = NULL,
textout = FALSE,
ask,
...,
xlabstub = "kutils peek: ",
freq = FALSE,
histargs = list(probability = !freq),
barargs = list(horiz = TRUE, las = 1)
)
Arguments
dat |
An R data frame or something that can be coerced to a
data frame by |
sort |
Default TRUE. Do you want display of the columns in alphabetical order? |
file |
Should output go in file rather than to the screen. Default is NULL, meaning show on screen. If you supply a file name, we will write PDF output into it. |
textout |
If TRUE, counts from histogram bins and tables will appear in the console. |
ask |
As in the old style R |
... |
Additional arguments for the pdf, histogram, table, or barplot functions. Please see Details below. |
xlabstub |
A text stub that will appear in the x axis label. Currently it includes advertising for this package. |
freq |
As in the histogram frequency argument. Should graphs show counts (freq = TRUE) or proportions (AKA densities) (freq = FALSE) |
histargs |
A list of arguments to be passed to the
|
barargs |
A list of arguments to be passed to the
|
Value
A vector of column names that were plotted
Try the Defaults
Every effort has been made to make this
simple and easy to use. Please run the examples as they are
before becoming too concerned about customization. This
function is intended for getting a quick look at each
variable, one-by-one, it is not intended to create publication
quality histograms. For sake of the fastidious users, a lot
of settings can be adjusted. Users can control the parameters
for presentation of histograms (parameters for hist
)
and barplots (parameters for barplot
). The function also
can create frequency tables (which users can control by providing
additional named arguments).
Style
The histograms are standard, upright histograms. The barplots are horizontal. I chose to make the bars horizontal because long value labels are more easily accomodated on the left axis. The code measures the length (in inches) for strings and the margin is increased accordingly. The examples have a demonstration of that effect.
Dealing with Dots
additional named arguments,
...
, are inspected and sorted into groups intended to
control use of R functions hist
, barplot
,
table
and pdf
.
The parameters
c("exclude", "dnn", "useNA", "deparse.level") and will go to
the table
function, which is used to make barplots for
factor and character variables. These named arguments are
extracted and sent to the pdf function: c("width", "height",
"onefile", "family", "title", "fonts", "version", "paper",
"encoding", "bg", "fg", "pointsize", "pagecentre",
"colormodel", "useDingbats", "useKerning", "fillOddEven",
"compress"). Any other arguments that are unique to
hist
or barplot
are sorted out and sent only to
those functions.
Any other arguments, including
graphical parameters will be sent to both the histogram and
barplot functions, so it is a convenient way to obtain uniform
appearance. Additional arguments that are common to
barplot
and hist
will work, and so will any
graphics parameters (named arguments of par
, for
example). However, if one wants to target some arguments to
hist
, but not barplot
, then the histargs
list argument should be used. Similarly, barargs
should
be used to send argument to the barplot
function. Warning: the defaults for histargs
and
barargs
include some settings that are needed for the
existing design. If new lists for histargs
or
barargs
are supplied, the previously specified defaults
are lost. Hence, users should include the existing members of
those lists, possibly with revised values.
All of
this argument sorting effort is done in order to reduce a
prolific number of warnings that were observed in previous
editions of this function.
Author(s)
Paul Johnson <pauljohn@ku.edu>
Examples
set.seed(234234)
N <- 200
mydf <- data.frame(x5 = rnorm(N), x4 = rnorm(N), x3 = rnorm(N),
x2 = letters[sample(1:24, 200, replace = TRUE)],
x1 = factor(sample(c("cindy", "bobby", "marsha",
"greg", "chris"), 200, replace = TRUE)),
stringsAsFactors = FALSE)
## Insert 16 missings
mydf$x1[sample(1:150, 16,)] <- NA
mydf$adate <- as.Date(c("1jan1960", "2jan1960", "31mar1960", "30jul1960"), format = "%d%b%y")
peek(mydf)
peek(mydf, sort = FALSE)
## Demonstrate the dot-dot-dot usage to pass in hist params
peek(mydf, breaks = 30, ylab = "These are Counts, not Densities", freq = TRUE)
## Not Run: file output
## peek(mydf, sort = FALSE, file = "three_histograms.pdf")
## Use some objects from the datasets package
library(datasets)
peek(cars, xlabstub = "R cars data: ")
peek(EuStockMarkets, xlabstub = "Euro Market Data: ")
peek(EuStockMarkets, xlabstub = "Euro Market Data: ", breaks = 50,
freq = TRUE)
## Not run: file output
## peek(EuStockMarkets, breaks = 50, file = "myeuro.pdf",
## height = 4, width=3, family = "Times")
## peek(EuStockMarkets, breaks = 50, file = "myeuro-%d3.pdf",
## onefile = FALSE, family = "Times", textout = TRUE)
## xlab goes into "..." and affects both histograms and barplots
peek(mydf, breaks = 30, ylab = "These are Counts, not Densities",
freq = TRUE)
## xlab is added in the barargs list.
peek(mydf, breaks = 30, ylab = "These are Counts, not Densities",
freq = TRUE, barargs = list(horiz = TRUE, las = 1, xlab = "I'm in barargs"))
peek(mydf, breaks = 30, ylab = "These are Counts, not Densities", freq = TRUE,
barargs = list(horiz = TRUE, las = 1, xlim = c(0, 100),
xlab = "I'm in barargs, not in histargs"))
levels(mydf$x1) <- c(levels(mydf$x1), "arthur philpot smythe")
mydf$x1[4] <- "arthur philpot smythe"
mydf$x2[1] <- "I forgot what letter"
peek(mydf, breaks = 30,
barargs = list(horiz = TRUE, las = 1))