skim {skimr} | R Documentation |
Skim a data frame, getting useful summary statistics
Description
skim()
is an alternative to summary()
, quickly providing a broad
overview of a data frame. It handles data of all types, dispatching a
different set of summary functions based on the types of columns in the data
frame.
Usage
skim(data, ..., .data_name = NULL)
skim_tee(data, ..., skim_fun = skim)
skim_without_charts(data, ..., .data_name = NULL)
Arguments
data |
A tibble, or an object that can be coerced into a tibble. |
... |
Columns to select for skimming. When none are provided, the default is to skim all columns. |
.data_name |
The name to use for the data. Defaults to the same as data. |
skim_fun |
The skim function used. |
skim |
The skimming function to use in |
Details
Each call produces a skim_df
, which is a fundamentally a tibble with a
special print method. One unusual feature of this data frame is pseudo-
namespace for columns. skim()
computes statistics by data type, and it
stores them in the data frame as <type>.<statistic>
. These types are
stripped when printing the results. The "base" skimmers (n_missing
and
complete_rate
) are the only columns that don't follow this behavior.
See skim_with()
for more details on customizing skim()
and
get_default_skimmers()
for a list of default functions.
If you just want to see the printed output, call skim_tee()
instead.
This function returns the original data. skim_tee()
uses the default
skim()
, but you can replace it with the skim
argument.
The data frame produced by skim
is wide and sparse. To avoid type coercion
skimr
uses a type namespace for all summary statistics. Columns for numeric
summary statistics all begin numeric
; for factor summary statistics
begin factor
; and so on.
See partition()
and yank()
for methods for transforming this wide data
frame. The first function splits it into a list, with each entry
corresponding to a data type. The latter pulls a single subtable for a
particular type from the skim_df
.
skim()
is designed to operate in pipes and to generally play nicely with
other tidyverse
functions. This means that you can use tidyselect
helpers
within skim
to select or drop specific columns for summary. You can also
further work with a skim_df
using dplyr
functions in a pipeline.
Value
A skim_df
object, which also inherits the class(es) of the input
data. In many ways, the object behaves like a tibble::tibble()
.
Customizing skim
skim()
is an intentionally simple function, with minimal arguments like
summary()
. Nonetheless, this package provides two broad approaches to
how you can customize skim()
's behavior. You can customize the functions
that are called to produce summary statistics with skim_with()
.
Unicode rendering
If the rendered examples show unencoded values such as <U+2587>
you will
need to change your locale to allow proper rendering. Please review the
Using Skimr vignette for more information
(vignette("Using_skimr", package = "skimr")
).
Otherwise, we export skim_without_charts()
to produce summaries without the
spark graphs. These are the source of the unicode dependency.
Examples
skim(iris)
# Use tidyselect
skim(iris, Species)
skim(iris, starts_with("Sepal"))
skim(iris, where(is.numeric))
# Skim also works groupwise
iris %>%
dplyr::group_by(Species) %>%
skim()
# Which five numeric columns have the greatest mean value?
# Look in the `numeric.mean` column.
iris %>%
skim() %>%
dplyr::select(numeric.mean) %>%
dplyr::top_n(5)
# Which of my columns have missing values? Use the base skimmer n_missing.
iris %>%
skim() %>%
dplyr::filter(n_missing > 0)
# Use skim_tee to view the skim results and
# continue using the original data.
chickwts %>%
skim_tee() %>%
dplyr::filter(feed == "sunflower")
# Produce a summary without spark graphs
iris %>%
skim_without_charts()