data-transformations {collapse} | R Documentation |
Data Transformations
Description
collapse provides an ensemble of functions to perform common data transformations efficiently and user friendly:
-
dapply
applies functions to rows or columns of matrices and data frames, preserving the data format. -
BY
is an S3 generic for efficient Split-Apply-Combine computing, similar todapply
. A set of arithmetic operators facilitates row-wise
%rr%
,%r+%
,%r-%
,%r*%
,%r/%
and column-wise%cr%
,%c+%
,%c-%
,%c*%
,%c/%
replacing and sweeping operations involving a vector and a matrix or data frame / list. Since v1.7, the operators%+=%
,%-=%
,%*=%
and%/=%
do column- and element- wise math by reference, and the functionsetop
can also perform sweeping out rows by reference.-
(set)TRA
is a more advanced S3 generic to efficiently perform (groupwise) replacing and sweeping out of statistics, either by creating a copy of the data or by reference. Supported operations are:Integer-id String-id Description 0 "na" or "replace_na" replace only missing values 1 "fill" or "replace_fill" replace everything 2 "replace" replace data but preserve missing values 3 "-" subtract 4 "-+" subtract group-statistics but add group-frequency weighted average of group statistics 5 "/" divide 6 "%" compute percentages 7 "+" add 8 "*" multiply 9 "%%" modulus 10 "-%%" subtract modulus All of collapse's Fast Statistical Functions have a built-in
TRA
argument for faster access (i.e. you can compute (groupwise) statistics and use them to transform your data with a single function call). -
fscale/STD
is an S3 generic to perform (groupwise and / or weighted) scaling / standardizing of data and is orders of magnitude faster thanscale
. -
fwithin/W
is an S3 generic to efficiently perform (groupwise and / or weighted) within-transformations / demeaning / centering of data. Similarlyfbetween/B
computes (groupwise and / or weighted) between-transformations / averages (also a lot faster thanave
). -
fhdwithin/HDW
, shorthand for 'higher-dimensional within transform', is an S3 generic to efficiently center data on multiple groups and partial-out linear models (possibly involving many levels of fixed effects and interactions). In other words,fhdwithin/HDW
efficiently computes residuals from linear models. Similarlyfhdbetween/HDB
, shorthand for 'higher-dimensional between transformation', computes the corresponding means or fitted values. -
flag/L/F
,fdiff/D/Dlog
andfgrowth/G
are S3 generics to compute sequences of lags / leads and suitably lagged and iterated (quasi-, log-) differences and growth rates on time series and panel data.fcumsum
flexibly computes (grouped, ordered) cumulative sums. More in Time Series and Panel Series. -
STD, W, B, HDW, HDB, L, D, Dlog
andG
are parsimonious wrappers around thef-
functions above representing the corresponding transformation 'operators'. They have additional capabilities when applied to data-frames (i.e. variable selection, formula input, auto-renaming and id-variable preservation), and are easier to employ in regression formulas, but are otherwise identical in functionality.
Table of Functions
Function / S3 Generic | Methods | Description | ||
dapply | No methods, works with matrices and data frames | Apply functions to rows or columns | ||
BY | default, matrix, data.frame, grouped_df | Split-Apply-Combine computing | ||
%(r/c)(r/+/-/*//)% | No methods, works with matrices and data frames / lists | Row- and column-arithmetic | ||
(set)TRA | default, matrix, data.frame, grouped_df | Replace and sweep out statistics (by reference) | ||
fscale/STD | default, matrix, data.frame, pseries, pdata.frame, grouped_df | Scale / standardize data | ||
fwithin/W | default, matrix, data.frame, pseries, pdata.frame, grouped_df | Demean / center data | ||
fbetween/B | default, matrix, data.frame, pseries, pdata.frame, grouped_df | Compute means / average data | ||
fhdwithin/HDW | default, matrix, data.frame, pseries, pdata.frame | High-dimensional centering and lm residuals | ||
fhdbetween/HDB | default, matrix, data.frame, pseries, pdata.frame | High-dimensional averages and lm fitted values | ||
flag/L/F , fdiff/D/Dlog , fgrowth/G , fcumsum | default, matrix, data.frame, pseries, pdata.frame, grouped_df | (Sequences of) lags / leads, differences, growth rates and cumulative sums |
See Also
Collapse Overview, Fast Statistical Functions, Time Series and Panel Series