ln_normalize {labNorm} | R Documentation |
Normalize lab values to age and sex
Description
Normalize standard laboratory measurements (e.g. hemoglobin, cholesterol levels) according to age and sex, based on the algorithms described in "Personalized lab test models to quantify disease potentials in healthy individuals" doi:10.1038/s41591-021-01468-6.
The "Clalit" reference distributions are based on 2.1B lab measurements taken from 2.8M individuals between 2002-2019, filtered to exclude severe chronic diseases and medication effects. The resulting normalized value is a quantile between 0 and 1, representing the value's position in the reference distribution.
The "UKBB" reference distributions are based on the UK-Biobank, a large-scale population-based cohort study of 500K individuals, which underwent the same filtering process as the "Clalit" reference distributions.
The list of supported labs can be found below or by running LAB_DETAILS$short_name
.
Usage
ln_normalize(
values,
age,
sex,
lab,
units = NULL,
reference = "Clalit",
na.rm = FALSE
)
ln_normalize_multi(labs_df, reference = "Clalit", na.rm = FALSE)
Arguments
values |
a vector of lab values |
age |
a vector of ages between 20-89 for "Clalit" reference and 35-80 for "UKBB". Can be a single value if all values are the same age. |
sex |
a vector of either "male" or "female". Can be a single value if all values are the same sex. |
lab |
the lab name. See |
units |
the units of the lab values. See |
reference |
the reference distribution to use. Can be either "Clalit" or "UKBB" or "Clalit-demo". Please download the Clalit and UKBB reference distributions using |
na.rm |
if |
labs_df |
a data frame with the columns "value", "age", "sex", "units", and "lab". The "lab" column should be a vector with the lab name per row. See |
Value
a vector of normalized values. If ln_download_data()
was not run, a lower resolution reference distribution will be used, which can have an error of up to 5 quantiles (0.05). Otherwise, the full reference distribution will be used. You can check if the high resolution data was downloaded using ln_data_downloaded()
.
You can force the function to use the lower resolution distribution by setting options(labNorm.use_low_res = TRUE)
.
If the quantile information is not available (e.g. "Estradiol" for male patients, various labs which are not available in the UKBB data), then the function will return NA
.
reference distribution
It is highly recommended to use ln_download_data
to download the "Clalit" and "UKBB" reference distributions. If you choose not to download the data, the package will use the demo reference distributions included in the package ("Clalit-demo"), which doesn't include all the labs, and has a resolution of 20 quantile bins and therefore may have an error of up to 5 percentiles (0.05), particularly at the edges of the distribution.
labs
The following labs are supported in the "Clalit" reference (some labs are missing from the UKBB reference):
WBC
RBC
Hemoglobin
Hematocrit
Platelets
MCV
MCH
MCHC
RDW
MPV
Large unstained cells, Abs
Albumin
Total Cholesterol
Triglycerides
BMI
Iron
Transferrin
Ferritin
PDW
MPXI
Total Globulin
PCT
HDW
Fibrinogen
CH
Chloride
Large unstained cells, %
Macrocytic
Microcytic
Hyperchromic
Hypochromic
Lymphocytes, Abs
Lymphocytes, %
Neutrophils, Abs
Neutrophils, %
Monocytes, Abs
Monocytes, %
Eosinophils, Abs
Eosinophils, %
Basophils, Abs
Basophils, %
Microcytic:Hypochromic
Glucose
Urea
Creatinine
Uric Acid
Calcium
Phosphorus
Total Protein
HDL Cholesterol
LDL Cholesterol
Alk. Phosphatase
AST
ALT
GGT
LDH
CPK
Total Bilirubin
Direct Bilirubin
Hemoglobin A1c
Sodium
Potassium
Vitamin D (25-OH)
Microalbumin:Creatinine
Urine Creatinine
Urine Microalbumin
Non-HDL
TSH
T3, Free
T4, Free
Blood Pressure, Systolic
Blood Pressure, Diastolic
Urine Specific Gravity
Urine pH
PT, INR
PT, sec
PT, %
Vitamin B12
PSA
ESR
aPTT, sec
CRP
Amylase
Folic Acid
Total:HDL
Hematocrit:Hemoglobin
Magnesium
aPTT, ratio
Indirect Bilirubin
RDW-SD
RDW-CV
LH
Estradiol
Examples
# Normalize Hemoglobin values to age and sex
hemoglobin_data$quantile <- ln_normalize(
hemoglobin_data$value,
hemoglobin_data$age,
hemoglobin_data$sex,
"Hemoglobin"
)
# plot the quantiles vs values for age 50-60
library(ggplot2)
library(dplyr)
hemoglobin_data %>%
filter(age >= 50 & age <= 60) %>%
ggplot(aes(x = value, y = quantile, color = sex)) +
geom_point() +
theme_classic()
# Different units
hemoglobin_diff_units <- hemoglobin_data
hemoglobin_diff_units$value <- hemoglobin_diff_units$value * 0.1
hemoglobin_diff_units$quantile <- ln_normalize(
hemoglobin_data$value,
hemoglobin_data$age,
hemoglobin_data$sex,
"Hemoglobin",
"mg/mL"
)
# Multiple units
creatinine_diff_units <- creatinine_data
creatinine_diff_units$value <- c(
creatinine_diff_units$value[1:500] * 0.011312,
creatinine_diff_units$value[501:1000] * 11.312
)
creatinine_diff_units$quantile <- ln_normalize(
creatinine_diff_units$value,
creatinine_diff_units$age,
creatinine_diff_units$sex,
"Creatinine",
c(rep("umol/L", 500), rep("mmol/L", 500))
)
# Use UKBB as reference
hemoglobin_data_ukbb <- hemoglobin_data %>% filter(age >= 35 & age <= 80)
hemoglobin_data_ukbb$quantile_ukbb <- ln_normalize(
hemoglobin_data_ukbb$value,
hemoglobin_data_ukbb$age,
hemoglobin_data_ukbb$sex,
"Hemoglobin",
reference = "UKBB"
)
# plot UKBB vs Clalit
hemoglobin_data_ukbb %>%
filter(age >= 50 & age <= 60) %>%
ggplot(aes(x = quantile, y = quantile_ukbb, color = sex)) +
geom_point() +
geom_abline() +
theme_classic()
# examples on the demo data
library(dplyr)
multi_labs_df <- bind_rows(
hemoglobin_data %>% mutate(lab = "Hemoglobin"),
creatinine_data %>% mutate(lab = "Creatinine")
)
multi_labs_df$quantile <- ln_normalize_multi(multi_labs_df)
# on the demo data
head(multi_labs_df)