| getWeights {phers} | R Documentation |
Calculate phecode-specific weights for phenotype risk scores
Description
This is typically the second step of an analysis using phenotype risk scores,
the next is getScores().
Usage
getWeights(
demos,
phecodeOccurrences,
method = c("prevalence", "logistic", "cox", "loglinear", "prevalence_precalc"),
methodFormula = NULL,
negativeWeights = FALSE,
dopar = FALSE
)
Arguments
demos |
A data.table having one row per person in the cohort. Must have
a column |
phecodeOccurrences |
A data.table of phecode occurrences for each person
in the cohort. Must have columns |
method |
A string indicating the statistical model for calculating weights. |
methodFormula |
A formula representing the right-hand side of the model
corresponding to |
negativeWeights |
Logical indicating whether to allow negative weights for individuals with no occurrences of a phecode. This option is not required for the "loglinear" method since under this method, individuals with a nonzero phecode occurrence can also have negative weights. |
dopar |
Logical indicating whether to run calculations in parallel if
a parallel backend is already set up, e.g., using
|
Value
A data.table with columns person_id, phecode, pred, and w.
The column pred represents a different quantity depending on method.
Under the "prevalence" method, it is fraction of the cohort that has
at least one occurrence of the given phecode. The "prevalence_precalc"
method is similar to the "prevalence" method but pred is calculated
based on EHR data from the Vanderbilt University Medical Center.
Under "logistic" or "cox" method, it is the predicted probability of
given individual having a given phecode based on methodFormula.
Under the "loglinear" method, it is the predicted
log2(num_occurrences + 1) of a given phecode for a given individual
based on methodFormula. For the "prevalence", "prevalence_precalc",
"cox", and "logistic" methods, weight is calculated as -log10(pred)
when an individual has non-zero phecode occurrence and log10(1 - pred)
when an individual has zero phecode occurrence. For the "loglinear" method
weight is calculated as the difference between the observed
log2(num_occurrences + 1) and pred.
See Also
getPhecodeOccurrences(), getScores()
Examples
library('data.table')
library('survival')
# map ICD codes to phecodes
phecodeOccurrences = getPhecodeOccurrences(icdSample)
# calculate weights using the prevalence method
weightsPrev = getWeights(demoSample, phecodeOccurrences)
# calculate weights using the prevalence method
# (assign negative weights to those with zero phecode occurrence)
weightsPrevNeg = getWeights(
demoSample, phecodeOccurrences, negativeWeights = TRUE)
# calculate weights using the logistic method
weightsLogistic = getWeights(
demoSample, phecodeOccurrences, method = 'logistic', methodFormula = ~ sex)
# calculate weights using the loglinear method
phecodeOccurrences2 = phecodeOccurrences[, .(
num_occurrences = uniqueN(entry_date)), by = .(person_id, phecode)]
weightsLoglinear = getWeights(
demoSample, phecodeOccurrences2, method = 'loglinear', methodFormula = ~ sex)
# calculate weights using the cox method
phecodeOccurrences3 = phecodeOccurrences[, .(
first_occurrence_date = min(entry_date)) , by = .(person_id, phecode)]
phecodeOccurrences3 = merge(
phecodeOccurrences3, demoSample[, .(person_id, dob)], by = 'person_id')
phecodeOccurrences3[,
occurrence_age := as.numeric((first_occurrence_date - dob)/365.25)]
phecodeOccurrences3[, `:=`(first_occurrence_date = NULL, dob = NULL)]
demoSample3 = demoSample[, .(
person_id, sex,
first_age = as.numeric((first_visit_date - dob)/365.25),
last_age = as.numeric((last_visit_date - dob)/365.25))]
weightsCox = getWeights(
demoSample3, phecodeOccurrences3, method = 'cox', methodFormula = ~ sex)
# calculate weights using pre-calculated weights based on data from
# Vanderbilt University Medical Center
weightsPreCalc = getWeights(
demoSample, phecodeOccurrences, method = 'prevalence_precalc')