pivot {lessR} | R Documentation |
Create a Pivot (Summary) Table
Description
Compute one or more designated descriptive statistics (compute
over one or more numerical variables (variable
) either for all the data or aggregated over one or more categorical variables (by
). Because the output is a two-dimensional table, select any two of the three possibilities: Multiple compute functions for the descriptive statistics, multiple continuous variables over which to compute, and multiple categorical variables by which to define groups for aggregation. Displays the sample size for each group. Uses the base R function aggregate
for which to perform the aggregation.
Usage
pivot(data, compute, variable, by=NULL, by_cols=NULL, filter=NULL,
show_n=TRUE, na_by_show=TRUE, na_remove=TRUE, na_group_show=TRUE,
out_names=NULL, sort=NULL, sort_var=NULL,
table_prop=c("none", "all", "row", "col"), table_long=FALSE,
factors=TRUE, q_num=4, digits_d=NULL, quiet=getOption("quiet"))
Arguments
data |
Data frame that contains the variables. |
compute |
One or more statistics, defined as one or more functions, to aggregate over the combinations of the values of the categorical variables. |
variable |
One or more numeric response variables for which to
|
by |
Categorical variables that define the groups (cells) listed in the rows of the output long-form data frame, available to input into other data analysis routines. Ignore to compute over the variables for all the data, e.g., the grand mean. |
by_cols |
Up to two categorical variables that define the groups displayed as columns in a two dimensional table. |
filter |
Subset, i.e., filter, rows of the input data frame for analysis. |
show_n |
By default, display the sample size and number missing for each
computed summary statistic. If |
na_by_show |
If |
na_remove |
Sets base R parameter |
na_group_show |
If |
out_names |
Custom names for the aggregated variables. If more than one,
list in the same order as specified in |
sort |
Set to |
sort_var |
Either the name of the variable in the output data frame to sort, or its column number. Default is the last column. |
table_prop |
Applies to a created |
table_long |
Applies to the value of |
factors |
For |
q_num |
For the computation of quantiles, number of intervals. Default value of 4 provides quartiles. |
digits_d |
Number of significant digits for each displayed summary statistic. Trailing zeros are deleted, so, for example, integers display as integers. If not specified, defaults to 3 unless there are more than 3 decimal digits and only a single digit to the left of the decimal point. Then enough digits are displayed to capture some non-zero decimal digits to avoid rounding to 0.000. To see all digits without trailing decimal 0's, set at a large number such as 20. |
quiet |
If set to |
Details
pivot
uses base R aggregate
to generate a pivot table (Excel terminology). Express multiple categorical variables over which to pivot as a vector with the c
function.
pivot
provides two additional features than aggregate
provides. First is a complete missing data analysis. If there is no missing data for the numerical variables that are aggregated, then the cell sizes are included with the aggregated data. If there is such missing data, then the amount of available data is displayed for all values to be aggregated for each cell.
The second is that the data parameter is listed first in the parameter list, which facilitates the use of the pipe operator from the magrittr
package. Also, there is a different interface as the by
variables are specified as a vector.
Variable ranges in the specification of by
are not needed in general. Only a small number of grouping variables generally define the cells for the aggregation.
The following table lists available single summary statistics. The list is not necessarily exhaustive as the references are to functions provided by base R, including any not listed below.
Statistic | Meaning |
----------- | -------------------------------- |
sum | sum |
mean | arithmetic mean |
median | median |
min | minimum |
max | maximum |
sd | standard deviation |
var | variance |
skew | skew |
kurtosis | kurtosis |
IQR | inter-quartile range |
mad | mean absolute deviation |
----------- | -------------------------------- |
The functions skew()
and kurtosis()
are provided by this package as they have no counterparts in base R. All other functions are from base R.
The quantile
and table
statistical function returns multiple values.
Statistic | Meaning |
----------- | -------------------------------- |
quantile | min, quartiles, max |
table | frequencies or proportions |
----------- | -------------------------------- |
The table
computation applies to an aggregated variable that consists of discrete categories, such as the numbers 1 through 5 for responses to a 5-pt Likert scale. The result is a table of frequencies or proportions, a contingency table, referred to for two or more variables as a cross-tabulation table or a joint frequency distribution. Other statistical functions can be simultaneously computed with table
, though only meaningful if the aggregated variable consists of a relatively small set of discrete, numeric values.
The default quantiles for quantile
are quartiles. Specify a custom number of quantiles with the q_num
parameter, which has the default value of 4 for quartiles.
Value
Returns a data frame of the aggregated values, unless for two by
variables and table_2d
is TRUE
, when a table is returned.
The count of the number of elements in each group is provided as the variable n
. If a combination of by
variable levels that defines a group is empty, the n
is set to 0 with the values of the variable
set to NA
.
The number of missing elements of the value
variable is provided as the variable miss
.
Author(s)
David W. Gerbing (Portland State University; gerbing@pdx.edu)
See Also
Examples
library(knitr) # for kable() called from pivot()
d <- Read("Employee", quiet=TRUE)
# parameter values named
pivot(data=d, compute=mean, variable=Salary, by=c(Dept, Gender))
# visualize the aggregation
# when reading a table of coordinates, a, BarChart cannot deal with
# with missing data so do not show groups that are missing as
# another level
a <- pivot(d, mean, Salary, c(Dept, Gender), na_group_show=FALSE)
BarChart(Dept, Salary_mean, by=Gender, data=a)
# calculate mean of Years and Salary for each combination of Dept and Gender
# parameter values by position
pivot(d, mean, c(Years, Salary), c(Dept, Gender))
# output as a 2-d cross-tabulation table
pivot(d, mean, Salary, Dept, Gender)
# cross-tabulation table
pivot(d, table, Dept, Gender)
# long form
pivot(d, table, Dept, Gender, table_long=TRUE)
# multiple functions for which to aggregate
pivot(d, c(mean,sd,median,IQR), Years, c(Gender,Dept), digits_d=2)
# A variety of statistics computed for several variables over the
# entire data set without aggregation
pivot(d, c(mean,sd,skew,kurtosis), c(Years,Salary,Pre,Post), digits_d=2)