generate_data {SPLICE} | R Documentation |
Generate Data of Varying Complexity
Description
Generates datasets under 5 scenarios of different levels of complexity (here
"complexity" means the level of difficulty of analysis).
Usage
generate_data(
n_claims_per_period,
n_periods = 40,
complexity = c(1:5),
data_type = c("claims", "payments", "incurred"),
random_seed = NULL,
verbose = TRUE,
covariates_obj = NULL
)
Arguments
n_claims_per_period |
expected number of claims per period (equals
the total expected number of claims divided by |
n_periods |
number of accident periods considered (equals number of claims development periods considered); default 40. |
complexity |
integer from 1 (simplest) to 5 (most complex); see Details. |
data_type |
a character vector specifying output data types. By default the function will output all 3 datasets (claims, payments, incurred), but the user may choose to output only a subset. |
random_seed |
optional seed for random number generation for reproducibility. |
verbose |
logical; if |
covariates_obj |
a SynthETIC |
Details
generate_data()
produces datasets of varying levels of complexity,
where 1 represents the simplest, and 5 represents the most complex:
1 – simple, homogeneous claims experience, with zero inflation.
2 – slightly more complex than 1, with dependence of notification delay and settlement delay on claim size, and 2% p.a. base inflation.
3 – steady increase in claim processing speed over occurrence periods (i.e. steady decline in settlement delays).
4 – inflation shock at time 30 (from 0% to 10% p.a.).
5 – default distributional models, with complex dependence structures (e.g. dependence of settlement delay on claim occurrence period).
We remark that this by no means defines the limits of the complexity that can
be generated with SPLICE
. This function is provided for the convenience of
users who wish to generate (a collection of) datasets under some
representative scenarios. If more complex features are required, the user is
free to modify the distributional assumptions (which, of course, requires
more thoughts and coding) to achieve their purposes.
Value
A named list of dataframes:
claim_dataset | A dataset of claim records that takes the same structure
as test_claim_dataset , with each row representing a
unique claim. |
payment_dataset | A dataset of partial payment records that takes the
same structure as test_transaction_dataset , with
each row representing a unique payment. |
incurred_dataset | A dataset of transaction records that tracks how the
case estimates change over time. Takes the same structure as
test_incurred_dataset , with each row representing a transaction
(any of claim notification, settlement, a payment, or a case estimate
revision). |
covariates_data | Only if covariates_obj is not NULL, in which case
it will return a SynthETIC covariates_data object.
|
See Also
generate_claim_dataset
,
generate_transaction_dataset
,
generate_incurred_dataset
Examples
# Generate datasets of full complexity
result <- generate_data(
n_claims_per_period = 50, data_type = c('claims', 'payments'),
complexity = 5, random_seed = 42)
# Save individual datasets
claims <- result$claim_dataset
payments <- result$payment_dataset
# Generate chain-ladder compatible dataset
CL_simple <- generate_data(
n_claims_per_period = 50, data_type = 'claims', complexity = 1, random_seed = 42)
# To mute message output
CL_simple_2 <- generate_data(
n_claims_per_period = 50, data_type = 'claims', verbose = FALSE, random_seed = 42)
# Ouput is reproducible with the same random_seed value
all.equal(CL_simple$claim_dataset, CL_simple_2$claim_dataset)