extract.prob {InSilicoVA} | R Documentation |
Obtain conditional probabilities from training data
Description
This is the function internally used in insilico.train
function.
Usage
extract.prob(
train,
gs,
gstable,
thre = 0.95,
type = c("quantile", "fixed", "empirical")[1],
isNumeric = FALSE,
impute = TRUE
)
Arguments
train |
Training data, it should be in the same format as the testing data
and contains one additional column (see |
gs |
the name of the column in |
gstable |
The list of causes of death used in training data. |
thre |
a numerical value between 0 to 1. It specifies the maximum rate of missing for any symptoms to be considered in the model. Default value is set to 0.95, meaning if a symptom has more than 95% missing in the training data, it will be removed. |
type |
Three types of learning conditional probabilities are provided: “quantile” or “fixed”. Since InSilicoVA works with ranked conditional probabilities P(S|C), “quantile” means the rankings of the P(S|C) are obtained by matching the same quantile distributions in the default InterVA P(S|C), and “fixed” means P(S|C) are matched to the closest values in the default InterVA P(S|C) table. Empirically both types of rankings produce similar results. The third option “empirical” means no rankings are calculated, only the raw P(S|C) values are returned. |
isNumeric |
Indicator if the input is already in numeric form. If the input is coded numerically such that 1 for “present”, 0 for “absent”, and -1 for “missing”, this indicator could be set to True to avoid conversion to standard InterVA format. |
impute |
Indicator for whether to impute 1. P(S|C) with P(S) if symptom S does not exist more than the threshold of fractions within death due to C; and 2. values of exact 0 or 1. |
Value
cond.prob |
raw P(S|C) matrix |
cond.prob.alpha |
ranked P(S|C) matrix |
table.alpha |
list of ranks used |
table.num |
list of median numerical values for each rank |
symps.train |
training data after removing symptoms with too high missing rate. |