trx_stats {actxps} | R Documentation |
Summarize transactions and utilization rates
Description
Create a summary data frame of transaction counts, amounts, and utilization rates.
Usage
trx_stats(
.data,
trx_types,
percent_of = NULL,
combine_trx = FALSE,
col_exposure = "exposure",
full_exposures_only = TRUE,
conf_int = FALSE,
conf_level = 0.95
)
## S3 method for class 'trx_df'
summary(object, ...)
Arguments
.data |
A data frame with exposure-level records of type
|
trx_types |
A character vector of transaction types to include in the
output. If none is provided, all available transaction types in |
percent_of |
A optional character vector containing column names in
|
combine_trx |
If |
col_exposure |
Name of the column in |
full_exposures_only |
If |
conf_int |
If |
conf_level |
Confidence level for confidence intervals |
object |
A |
... |
Groups to retain after |
Details
Unlike exp_stats()
, this function requires data
to be an
exposed_df
object.
If .data
is grouped, the resulting data frame will contain
one row per transaction type per group.
Any number of transaction types can be passed to the trx_types
argument,
however each transaction type must appear in the trx_types
attribute of
.data
. In addition, trx_stats()
expects to see columns named trx_n_{*}
(for transaction counts) and trx_amt_{*}
for (transaction amounts) for each
transaction type. To ensure .data
is in the appropriate format, use the
functions as_exposed_df()
to convert an existing data frame with
transactions or add_transactions()
to attach transactions to an existing
exposed_df
object.
Value
A tibble with class trx_df
, tbl_df
, tbl
,
and data.frame
. The results include columns for any grouping
variables and transaction types, plus the following:
-
trx_n
: the number of unique transactions. -
trx_amt
: total transaction amount -
trx_flag
: the number of observation periods with non-zero transaction amounts. -
exposure
: total exposures -
avg_trx
: mean transaction amount (trx_amt / trx_flag
) -
avg_all
: mean transaction amount over all records (trx_amt / exposure
) -
trx_freq
: transaction frequency when a transaction occurs (trx_n / trx_flag
) -
trx_utilization
: transaction utilization per observation period (trx_flag / exposure
)
If percent_of
is provided, the results will also include:
The sum of any columns passed to
percent_of
with non-zero transactions. These columns include the suffix_w_trx
.The sum of any columns passed to
percent_of
-
pct_of_{*}_w_trx
: total transactions as a percentage of column{*}_w_trx
. In other words, total transactions divided by the sum of a column including only records utilizing transactions. -
pct_of_{*}_all
: total transactions as a percentage of column{*}
. In other words, total transactions divided by the sum of a column regardless of whether or not transactions were utilized.
If conf_int
is set to TRUE
, additional columns are added for lower and
upper confidence interval limits around the observed utilization rate and any
percent_of
output columns. Confidence interval columns include the name
of the original output column suffixed by either _lower
or _upper
.
If values are passed to
percent_of
, an additional column is created containing the the sum of squared transaction amounts (trx_amt_sq
).
"Percentage of" calculations
The percent_of
argument is optional. If provided, this argument must
be a character vector with values corresponding to columns in .data
containing values to use as denominators in the calculation of utilization
rates or actual-to-expected ratios. Example usage:
In a study of partial withdrawal transactions, if
percent_of
refers to account values, observed withdrawal rates can be determined.In a study of recurring claims, if
percent_of
refers to a column containing a maximum benefit amount, utilization rates can be determined.
Confidence intervals
If conf_int
is set to TRUE
, the output will contain lower and upper
confidence interval limits for the observed utilization rate and any
percent_of
output columns. The confidence level is dictated
by conf_level
.
Intervals for the utilization rate (
trx_util
) assume a binomial distribution.Intervals for transactions as a percentage of another column with non-zero transactions (
pct_of_{*}_w_trx
) are constructed using a normal distributionIntervals for transactions as a percentage of another column regardless of transaction utilization (
pct_of_{*}_all
) are calculated assuming that the aggregate distribution is normal with a mean equal to observed transactions and a variance equal to:Var(S) = E(N) * Var(X) + E(X)^2 * Var(N)
,Where
S
is the aggregate transactions random variable,X
is an individual transaction amount assumed to follow a normal distribution, andN
is a binomial random variable for transaction utilization.
Default removal of partial exposures
As a default, partial exposures are removed from .data
before summarizing
results. This is done to avoid complexity associated with a lopsided skew
in the timing of transactions. For example, if transactions can occur on a
monthly basis or annually at the beginning of each policy year, partial
exposures may not be appropriate. If a policy had an exposure of 0.5 years
and was taking withdrawals annually at the beginning of the year, an
argument could be made that the exposure should instead be 1 complete year.
If the same policy was expected to take withdrawals 9 months into the year,
it's not clear if the exposure should be 0.5 years or 0.5 / 0.75 years.
To override this treatment, set full_exposures_only
to FALSE
.
summary()
Method
Applying summary()
to a trx_df
object will re-summarize the
data while retaining any grouping variables passed to the "dots"
(...
).
Examples
expo <- expose_py(census_dat, "2019-12-31", target_status = "Surrender") |>
add_transactions(withdrawals)
res <- expo |> group_by(inc_guar) |> trx_stats(percent_of = "premium")
res
summary(res)
expo |> group_by(inc_guar) |>
trx_stats(percent_of = "premium", combine_trx = TRUE, conf_int = TRUE)