prepare_data {injurytools} | R Documentation |
Prepare data in a standardized format
Description
These are the data preprocessing functions provided by the injurytools
package, which involve:
setting exposure and injury data in a standardized format and
integrating both sources of data into an adequate data structure.
prepare_inj()
and prepare_exp()
set standardized names and
proper classes to the (key) columns in injury and exposure data,
respectively. prepare_all()
integrates both, standardized injury and
exposure data sets, and convert them into an injd
S3 object
that has an adequate structure for further statistical analyses.
See the Prepare Sports Injury Data
vignette for details.
Usage
prepare_inj(
df_injuries0,
player = "player",
date_injured = "date_injured",
date_recovered = "date_recovered"
)
prepare_exp(
df_exposures0,
player = "player",
date = "date",
time_expo = "time_expo"
)
prepare_all(
data_exposures,
data_injuries,
exp_unit = c("minutes", "hours", "days", "matches_num", "matches_minutes",
"activity_days", "seasons")
)
Arguments
df_injuries0 |
A data frame containing injury information, with columns referring to the player name/id, date of injury and date of recovery (as minimal data). |
player |
Character referring to the column name where player information is stored. |
date_injured |
Character referring to the column name where the information about the date of injury is stored. |
date_recovered |
Character referring to the column name where the information about the date of recovery is stored. |
df_exposures0 |
A data frame containing exposure information, with columns referring to the player name/id, date of exposure and the total time of exposure of the corresponding data entry (as minimal data). |
date |
Character referring to the column name where the exposure date
information is stored. Besides, the column must be of class
Date or
integer/numeric. If it is
|
time_expo |
Character referring to the column name where the information about the time of exposure in that corresponding date is stored. |
data_exposures |
Exposure data frame with standardized column names, in
the same fashion that |
data_injuries |
Injury data frame with standardized column names, in the
same fashion that |
exp_unit |
Character defining the unit of exposure time ("minutes" the default). |
Value
prepare_inj()
returns a data frame in which the key
columns in injury data are standardized and have a proper format.
prepare_exp()
returns a data frame in which the key
columns in exposure data are standardized and have a proper format.
prepare_all()
returns the injd
S3 object that
contains all the necessary information and a proper data structure to
perform further statistical analyses (e.g. calculate injury summary
statistics, visualize injury data).
If
exp_unit
is "minutes" (the default), the columnststart_min
andtstop_min
are created which specify the time to event (injury) values, the starting and stopping time of the interval, respectively. That is the training time in minutes, that the player has been at risk, until an injury (or censorship) has occurred. For other choices,tstart_x
andtstop_x
are also created according to theexp_unit
indicated (x
, one of:min
,h
,match
,minPlay
,d
,acd
ors
). These columns will be useful for survival analysis routines. See Note section.It also creates
days_lost
column based on the difference betweendate_recovered
anddate_injured
in days. And if it does exist (in the raw data) it overrides.
Note
Depending on the unit of exposure, tstart_x
and tstop_x
columns might have same values (e.g. if exp_unit
= "matches_num" and the
player has not played any match between the corresponding period of time).
Please be aware of this before performing any survival analysis related
task.
Examples
df_injuries <- prepare_inj(df_injuries0 = raw_df_injuries,
player = "player_name",
date_injured = "from",
date_recovered = "until")
df_exposures <- prepare_exp(df_exposures0 = raw_df_exposures,
player = "player_name",
date = "year",
time_expo = "minutes_played")
injd <- prepare_all(data_exposures = df_exposures,
data_injuries = df_injuries,
exp_unit = "matches_minutes")
head(injd)
class(injd)
str(injd, 1)