createRemDataset {rem} | R Documentation |
Create REM data set with dynamic risk sets
Description
The function creates counting process data sets with dynamic risk sets for relational event models. For each event in the event sequence, null-events are generated and represent possible events that could have happened at that time but did not. A data set with true and null-events is returned with an event dummy for whether the event occurred or was simply possible (variable eventdummy
). The returned data set also includes a variable eventTime
which represents the true time of the reported event.
Usage
createRemDataset(data, sender, target, eventSequence,
eventAttribute = NULL, time = NULL,
start = NULL, startDate = NULL,
end = NULL, endDate = NULL,
timeformat = NULL,
atEventTimesOnly = TRUE, untilEventOccurrs = TRUE,
includeAllPossibleEvents = FALSE, possibleEvents = NULL,
returnInputData = FALSE)
Arguments
data |
A data frame containing all the events. |
sender |
A string (or factor or numeric) variable that represents the sender of the event. |
target |
A string (or factor or numeric) variable that represents the target of the event. |
eventSequence |
Numeric variable that represents the event sequence. The variable has to be sorted in ascending order. |
eventAttribute |
An optional variable that represents an attribute to an event. Repeated events affect the construction of the counting process data set. Use the |
time |
An optional date variable that represents the date an event took place. The variable is used if |
start |
An optional numeric variable that indicates at which point in the event sequence a specific event was at risk. The variable has to be numerical and correspond to the variable |
startDate |
An optional date variable that represents the date an event started being at risk. |
end |
An optional numeric variable that indicates at which point in the event sequence a specific event stopped being at risk. The variable has to be numerical and correspond to the variable |
endDate |
An optional date variable that represents the date an event stoped being at risk. |
timeformat |
A character string indicating the format of the |
atEventTimesOnly |
|
untilEventOccurrs |
|
includeAllPossibleEvents |
|
possibleEvents |
An optional data set with the form: column 1 = sender, column 2 = target, 3 = start, 4 = end, 5 = event attribute, 6... . The data set provides all possible events for the entire event sequence and gives each possible event a start and end value to determine when each event could have been possible. This is useful if the risk set follows a complex pattern that cannot be resolved with the above options. E.g., providing a |
returnInputData |
|
Details
To follow.
Author(s)
Laurence Brandenberger laurence.brandenberger@eawag.ch
See Also
Examples
## Example 1: standard conditional logistic set-up
dt <- data.frame(
sender = c('a', 'c', 'd', 'a', 'a', 'f', 'c'),
target = c('b', 'd', 'd', 'b', 'b', 'a', 'd'),
eventSequence = c(1, 2, 2, 3, 3, 4, 6)
)
count.data <- createRemDataset(
data = dt, sender = dt$sender,
target = dt$target, eventSequence = dt$eventSequence,
eventAttribute = NULL, time = NULL,
start = NULL, startDate = NULL,
end = NULL, endDate = NULL,
timeformat = NULL,
atEventTimesOnly = TRUE, untilEventOccurrs = TRUE,
includeAllPossibleEvents = FALSE, possibleEvents = NULL,
returnInputData = FALSE)
## Example 2: add 2 attributes to the event-classification
dt <- data.frame(
sender = c('a', 'c', 'd', 'a', 'a', 'f', 'c'),
target = c('b', 'd', 'd', 'b', 'b', 'a', 'd'),
pro.con = c('pro', 'pro', 'con', 'pro', 'con', 'pro', 'pro'),
attack = c('yes', 'no', 'no', 'yes', 'yes', 'no', 'yes'),
eventSequence = c(1, 2, 2, 3, 3, 4, 6)
)
count.data <- createRemDataset(
data = dt, sender = dt$sender,
target = dt$target, eventSequence = dt$eventSequence,
eventAttribute = paste0(dt$pro.con, dt$attack), time = NULL,
start = NULL, startDate = NULL,
end = NULL, endDate = NULL,
timeformat = NULL,
atEventTimesOnly = TRUE, untilEventOccurrs = TRUE,
includeAllPossibleEvents = FALSE, possibleEvents = NULL,
returnInputData = FALSE)
## Example 3: adding start and end variables
# Note: the start and end variables will be overwritten
# if there are duplicate events. If you want to
# keep the strict start and stop values that you set, use
# includeAllPossibleEvents = TRUE and specify a
# possibleEvents-data set.
# Note 2: if untilEventOccurrs = TRUE and an end
# variable is provided, this end variable is
# overwritten. Set untilEventOccurrs 0 FALSE and
# provide the end variable if you want the events
# possibilities to stop at these exact event times.
dt <- data.frame(
sender = c('a', 'c', 'd', 'a', 'a', 'f', 'c'),
target = c('b', 'd', 'd', 'b', 'b', 'a', 'd'),
eventSequence = c(1, 2, 2, 3, 3, 4, 6),
start = c(0, 0, 1, 1, 1, 3, 3),
end = rep(6, 7)
)
count.data <- createRemDataset(
data = dt, sender = dt$sender,
target = dt$target, eventSequence = dt$eventSequence,
eventAttribute = NULL, time = NULL,
start = dt$start, startDate = NULL,
end = dt$end, endDate = NULL,
timeformat = NULL,
atEventTimesOnly = TRUE, untilEventOccurrs = TRUE,
includeAllPossibleEvents = FALSE, possibleEvents = NULL,
returnInputData = FALSE)
## Example 4: using start (and stop) dates
dt <- data.frame(
sender = c('a', 'c', 'd', 'a', 'a', 'f', 'c'),
target = c('b', 'd', 'd', 'b', 'b', 'a', 'd'),
eventSequence = c(1, 2, 2, 3, 3, 4, 6),
date = c('01.02.1971', rep('02.02.1971', 2),
rep('03.02.1971', 2), '04.02.1971', '06.02.1971'),
dateAtRisk = c(rep('21.01.1971', 2), rep('01.02.1971', 5)),
dateRiskEnds = rep('01.03.1971', 7)
)
count.data <- createRemDataset(
data = dt, sender = dt$sender, target = dt$target,
eventSequence = dt$eventSequence,
eventAttribute = NULL, time = dt$date,
start = NULL, startDate = dt$dateAtRisk,
end = NULL, endDate = NULL,
timeformat = '%d.%m.%Y',
atEventTimesOnly = TRUE, untilEventOccurrs = TRUE,
includeAllPossibleEvents = FALSE, possibleEvents = NULL,
returnInputData = FALSE)
# if you want to include null-events at times when no event happened,
# either see Example 5 or create a start-variable by yourself
# by using the eventSequence()-command with the option
# 'returnDateSequenceData = TRUE' in this package. With the
# generated sequence, dates from startDate can be matched
# to the event sequence values (using the match()-command).
## Example 5: using start and stop dates and including
# possible events whenever no event occurred.
possible.events <- data.frame(
sender = c('a', 'c', 'd', 'f'),
target = c('b', 'd', 'd', 'a'),
start = c(0, 0, 1, 1),
end = c(rep(8, 4)))
count.data <- createRemDataset(
data = dt, sender = dt$sender, target = dt$target,
eventSequence = dt$eventSequence,
eventAttribute = NULL, time = NULL,
start = NULL, startDate = NULL,
end = NULL, endDate = NULL,
timeformat = NULL,
atEventTimesOnly = TRUE, untilEventOccurrs = TRUE,
includeAllPossibleEvents = TRUE, possibleEvents = possible.events,
returnInputData = FALSE)
# now you can set 'atEventTimesOnly = FALSE' to include
# null-events where none occurred until the events happened
count.data <- createRemDataset(
data = dt, sender = dt$sender, target = dt$target,
eventSequence = dt$eventSequence,
eventAttribute = NULL, time = NULL,
start = NULL, startDate = NULL,
end = NULL, endDate = NULL,
timeformat = NULL,
atEventTimesOnly = FALSE, untilEventOccurrs = TRUE,
includeAllPossibleEvents = TRUE, possibleEvents = possible.events,
returnInputData = FALSE)
# plus you can set to get the full range of the events
# (bounded by max(possible.events$end))
count.data <- createRemDataset(
data = dt, sender = dt$sender, target = dt$target,
eventSequence = dt$eventSequence,
eventAttribute = NULL, time = NULL,
start = NULL, startDate = NULL,
end = NULL, endDate = NULL,
timeformat = NULL,
atEventTimesOnly = FALSE, untilEventOccurrs = FALSE,
includeAllPossibleEvents = TRUE, possibleEvents = possible.events,
returnInputData = FALSE)