insilico.train {InSilicoVA} | R Documentation |
Modified InSilicoVA methods with training data
Description
This function implements InSilicoVA model with non-InterVA4 input data.
Usage
insilico.train(
data,
train,
cause,
causes.table = NULL,
thre = 0.95,
type = c("quantile", "fixed", "empirical")[1],
isNumeric = FALSE,
updateCondProb = TRUE,
keepProbbase.level = TRUE,
CondProb = NULL,
CondProbNum = NULL,
datacheck = TRUE,
datacheck.missing = TRUE,
warning.write = FALSE,
external.sep = TRUE,
Nsim = 4000,
thin = 10,
burnin = 2000,
auto.length = TRUE,
conv.csmf = 0.02,
jump.scale = 0.1,
levels.prior = NULL,
levels.strength = NULL,
trunc.min = 1e-04,
trunc.max = 0.9999,
subpop = NULL,
java_option = "-Xmx1g",
seed = 1,
phy.code = NULL,
phy.cat = NULL,
phy.unknown = NULL,
phy.external = NULL,
phy.debias = NULL,
exclude.impossible.cause = TRUE,
impossible.combination = NULL,
indiv.CI = NULL,
CondProbTable = NULL,
...
)
Arguments
data |
The original data to be used. It is suggested to use similar
input as InterVA4, with the first column being death IDs and 245 symptoms.
The only difference in input is InsilicoVA takes three levels: “present”,
“absent”, and “missing (no data)”. Similar to InterVA software,
“present” symptoms takes value “Y”; “absent” symptoms take take value
“NA” or “”. For missing symptoms, e.g., questions not asked or answered
in the original interview, corrupted data, etc., the input should be coded
by “.” to distinguish from “absent” category. The order of the columns does
not matter as long as the column names are correct. It can also include more
unused columns than the standard InterVA4 input. But the first column should be
the death ID. For example input data format, see |
train |
Training data, it should be in the same format as the testing data
and contains one additional column (see |
cause |
the name of the column in |
causes.table |
The list of causes of death used in training data. |
thre |
a numerical value between 0 to 1. It specifies the maximum rate of missing for any symptoms to be considered in the model. Default value is set to 0.95, meaning if a symptom has more than 95% missing in the training data, it will be removed. |
type |
Three types of learning conditional probabilities are provided: “empirical”, “quantile”
or “fixed”. Since InSilicoVA works with ranked conditional probabilities P(S|C), “quantile”
means the rankings of the P(S|C) are obtained by matching the same quantile distributions
in the default InterVA P(S|C), and “fixed” means P(S|C) are matched to the closest values
in the default InterVA P(S|C) table. Empirically both types of rankings produce similar results. “empirical”, on the other hand, means no ranking is calculated, but use the empirical conditional probabilities directly. If “empirical”, |
isNumeric |
Indicator if the input is already in numeric form. If the input is coded numerically such that 1 for “present”, 0 for “absent”, and -1 for “missing”, this indicator could be set to True to avoid conversion to standard InterVA format. |
updateCondProb |
Logical indicator. If FALSE, then fit InSilicoVA model without re-estimating conditional probabilities. |
keepProbbase.level |
see |
CondProb |
see |
CondProbNum |
see |
datacheck |
Not Implemented. |
datacheck.missing |
Not Implemented. |
warning.write |
Not Implemented. |
external.sep |
Not Implemented. |
Nsim |
see |
thin |
see |
burnin |
see |
auto.length |
see |
conv.csmf |
see |
jump.scale |
see |
levels.prior |
see |
levels.strength |
see |
trunc.min |
see |
trunc.max |
see |
subpop |
see |
java_option |
see |
seed |
see |
phy.code |
see |
phy.cat |
see |
phy.unknown |
see |
phy.external |
see |
phy.debias |
see |
exclude.impossible.cause |
Whether to include impossible causes |
impossible.combination |
a matrix of two columns, first is the name of symptoms, and the second is the name of causes. Each row corresponds to a combination of impossible symptom (that exists) and cause. |
indiv.CI |
see |
CondProbTable |
a data frame of two columns: one alphabetic level of the CondProb argument and one numerical value corresponding to the numerical value of each level. Only used when only conditional probabilities are provided instead of training data. |
... |
not used |
Details
Please see insilico
for more details about choosing chain length and
OS system differences. This function implements InSilico with customized
input format and training data.
For more detail of model specification, see the paper on https://arxiv.org/abs/1411.3042.
Value
insilico
object