plyr {plyr} | R Documentation |
plyr: the split-apply-combine paradigm for R.
Description
The plyr package is a set of clean and consistent tools that implement the split-apply-combine pattern in R. This is an extremely common pattern in data analysis: you solve a complex problem by breaking it down into small pieces, doing something to each piece and then combining the results back together again.
Details
The plyr functions are named according to what sort of data structure they split up and what sort of data structure they return:
- a
array
- l
list
- d
data.frame
- m
multiple inputs
- r
repeat multiple times
- _
nothing
So ddply
takes a data frame as input and returns a data frame
as output, and l_ply
takes a list as input and returns nothing
as output.
Row names
By design, no plyr function will preserve row names - in general it is too
hard to know what should be done with them for many of the operations
supported by plyr. If you want to preserve row names, use
name_rows
to convert them into an explicit column in your
data frame, perform the plyr operations, and then use name_rows
again to convert the column back into row names.
Helpers
Plyr also provides a set of helper functions for common data analysis problems:
-
arrange
: re-order the rows of a data frame by specifying the columns to order by -
mutate
: add new columns or modifying existing columns, liketransform
, but new columns can refer to other columns that you just created. -
summarise
: likemutate
but create a new data frame, not preserving any columns in the old data frame. -
join
: an adapation ofmerge
which is more similar to SQL, and has a much faster implementation if you only want to find the first match. -
match_df
: a version ofjoin
that instead of returning the two tables combined together, only returns the rows in the first table that match the second. -
colwise
: make any function work colwise on a dataframe -
rename
: easily rename columns in a data frame -
round_any
: round a number to any degree of precision -
count
: quickly count unique combinations and return return as a data frame.