getWeights {phers} | R Documentation |
Calculate phecode-specific weights for phenotype risk scores
Description
This is typically the second step of an analysis using phenotype risk scores,
the next is getScores()
.
Usage
getWeights(
demos,
phecodeOccurrences,
method = c("prevalence", "logistic", "cox", "loglinear", "prevalence_precalc"),
methodFormula = NULL,
negativeWeights = FALSE,
dopar = FALSE
)
Arguments
demos |
A data.table having one row per person in the cohort. Must have
a column |
phecodeOccurrences |
A data.table of phecode occurrences for each person
in the cohort. Must have columns |
method |
A string indicating the statistical model for calculating weights. |
methodFormula |
A formula representing the right-hand side of the model
corresponding to |
negativeWeights |
Logical indicating whether to allow negative weights for individuals with no occurrences of a phecode. This option is not required for the "loglinear" method since under this method, individuals with a nonzero phecode occurrence can also have negative weights. |
dopar |
Logical indicating whether to run calculations in parallel if
a parallel backend is already set up, e.g., using
|
Value
A data.table with columns person_id
, phecode
, pred
, and w
.
The column pred
represents a different quantity depending on method
.
Under the "prevalence" method
, it is fraction of the cohort that has
at least one occurrence of the given phecode. The "prevalence_precalc"
method
is similar to the "prevalence" method
but pred
is calculated
based on EHR data from the Vanderbilt University Medical Center.
Under "logistic" or "cox" method
, it is the predicted probability of
given individual having a given phecode based on methodFormula
.
Under the "loglinear" method
, it is the predicted
log2(num_occurrences + 1)
of a given phecode for a given individual
based on methodFormula
. For the "prevalence", "prevalence_precalc",
"cox", and "logistic" method
s, weight is calculated as -log10(pred)
when an individual has non-zero phecode occurrence and log10(1 - pred)
when an individual has zero phecode occurrence. For the "loglinear" method
weight is calculated as the difference between the observed
log2(num_occurrences + 1)
and pred
.
See Also
getPhecodeOccurrences()
, getScores()
Examples
library('data.table')
library('survival')
# map ICD codes to phecodes
phecodeOccurrences = getPhecodeOccurrences(icdSample)
# calculate weights using the prevalence method
weightsPrev = getWeights(demoSample, phecodeOccurrences)
# calculate weights using the prevalence method
# (assign negative weights to those with zero phecode occurrence)
weightsPrevNeg = getWeights(
demoSample, phecodeOccurrences, negativeWeights = TRUE)
# calculate weights using the logistic method
weightsLogistic = getWeights(
demoSample, phecodeOccurrences, method = 'logistic', methodFormula = ~ sex)
# calculate weights using the loglinear method
phecodeOccurrences2 = phecodeOccurrences[, .(
num_occurrences = uniqueN(entry_date)), by = .(person_id, phecode)]
weightsLoglinear = getWeights(
demoSample, phecodeOccurrences2, method = 'loglinear', methodFormula = ~ sex)
# calculate weights using the cox method
phecodeOccurrences3 = phecodeOccurrences[, .(
first_occurrence_date = min(entry_date)) , by = .(person_id, phecode)]
phecodeOccurrences3 = merge(
phecodeOccurrences3, demoSample[, .(person_id, dob)], by = 'person_id')
phecodeOccurrences3[,
occurrence_age := as.numeric((first_occurrence_date - dob)/365.25)]
phecodeOccurrences3[, `:=`(first_occurrence_date = NULL, dob = NULL)]
demoSample3 = demoSample[, .(
person_id, sex,
first_age = as.numeric((first_visit_date - dob)/365.25),
last_age = as.numeric((last_visit_date - dob)/365.25))]
weightsCox = getWeights(
demoSample3, phecodeOccurrences3, method = 'cox', methodFormula = ~ sex)
# calculate weights using pre-calculated weights based on data from
# Vanderbilt University Medical Center
weightsPreCalc = getWeights(
demoSample, phecodeOccurrences, method = 'prevalence_precalc')