long_panel {panelr} | R Documentation |
Convert wide panels to long format
Description
This function takes wide format panels as input and converts them to long format.
Usage
long_panel(
data,
prefix = NULL,
suffix = NULL,
begin = NULL,
end = NULL,
id = "id",
wave = "wave",
periods = NULL,
label_location = c("end", "beginning"),
as_panel_data = TRUE,
match = ".*",
use.regex = FALSE,
check.varying = TRUE
)
Arguments
data |
The wide data frame. |
prefix |
What character(s) go before the period indicator? If none, set this argument to NULL. |
suffix |
What character(s) go after the period indicator? If none, set this argument to NULL. |
begin |
What is the label for the first period? Could be |
end |
What is the label for the final period? Could be |
id |
The name of the ID variable as a string. If there is no ID variable, then this will be the name of the newly-created ID variable. |
wave |
This will be the name of the newly-created wave variable. |
periods |
If you period indicator does not lie in a sequence or is
not understood by the function, then you can supply them as a vector
instead. For instance, you could give |
label_location |
Where does the period label go on the variable?
If the variables are labeled like |
as_panel_data |
Should the return object be a |
match |
The regex that will match the part of the variable names other
than the wave indicator. By default it will match any character any
amount of times. Sometimes you might know that the variable names should
start with a digit, for instance, and you might use |
use.regex |
Should the |
check.varying |
Should the function check to make sure that every variable in the wide data with a wave indicator is actually time-varying? Default is TRUE, meaning that a constant like "race_W1" only measured in wave 1 will be defined in each wave in the long data. With very large datasets, however, sometimes setting this to FALSE can save memory. |
Details
There is no easy way to convert panel data from wide to long format because the both formats are basically non-standard for other applications. This function can handle the common case in which the wide data frame has a regular labeling system for each period. The key thing is providing enough information for the function to understand the pattern.
In the end, this function calls stats::reshape()
but should be easier
to use and able to handle more situations, such as when the label occurs
at the beginning of the variable name. Also, just as important, this
function has built-in utilities to handle unbalanced data — when
variables occur more than once but every single period, which breaks
stats::reshape()
.
Value
Either a data.frame
or panel_data
frame.
See Also
Examples
## We need a wide data frame, so we will make one from the long-format
## data included in the package.
# Convert WageData to panel_data object
wages <- panel_data(WageData, id = id, wave = t)
# Convert wages to wide format
wide_wages <- widen_panel(wages)
# Note: wide_wages has variables in the following format:
# var1_1, var1_2, var1_3, var2_1, var2_2, var2_3, etc.
## Not run:
long_wages <- long_panel(wide_wages, prefix = "_", begin = 1, end = 7,
id = "id", label_location = "end")
## End(Not run)
# Note that in this case, the prefix and label_location arguments are
# the defaults but are included just for clarity.