est_test_subject {tempted}R Documentation

Estimate subject loading of testing data

Description

This function estimates the subject loading of the testing data based on feature and temporal loading from training data, so that both the testing data and training data have the same dimensionality reduction.

Usage

est_test_subject(datlist, res_tempted, mean_svd = NULL)

Arguments

datlist

Testing data formatted into datlist in the same fashion as the training data. The same transformation needs to be used for both training and testing data.

res_tempted

Result from tempted ran on the training data.

mean_svd

Result from svd_centralize ran on the training data.

Value

estimated subject loading of testing data

References

Shi P, Martino C, Han R, Janssen S, Buck G, Serrano M, Owzar K, Knight R, Shenhav L, Zhang AR. (2023) Time-Informed Dimensionality Reduction for Longitudinal Microbiome Studies. bioRxiv. doi: 10.1101/550749. https://www.biorxiv.org/content/10.1101/550749.

Examples

# Take a subset of the samples so the example runs faster

# Here we are taking samples from the odd months
sub_sample <- rownames(meta_table)[(meta_table$day_of_life%/%12)%%2==1]
count_table_sub <- count_table[sub_sample,]
processed_table_sub <- processed_table[sub_sample,]
meta_table_sub <- meta_table[sub_sample,]

# split the example data into training and testing

id_test <- meta_table_sub$studyid=="2"

count_train <- count_table_sub[!id_test,]
meta_train <- meta_table_sub[!id_test,]

count_test <- count_table_sub[id_test,]
meta_test <- meta_table_sub[id_test,]

# run tempted on training data

datlist_train <- format_tempted(count_train,
                                meta_train$day_of_life,
                                meta_train$studyid,
                                threshold=0.95,
                                pseudo=0.5,
                                transform="clr")

mean_svd_train <- svd_centralize(datlist_train, r=1)

res_tempted_train <- tempted(mean_svd_train$datlist,
r=2, smooth=1e-5)

# get the overlapping features

count_test <- count_test[,rownames(datlist_train[[1]])[-1]]

datlist_test <- format_tempted(count_test,
                               meta_test$day_of_life,
                               meta_test$studyid,
                               threshold=1,
                               pseudo=0.5,
                               transform="clr")

# estimate the subject loading of the testing subject

sub_test <- est_test_subject(datlist_test, res_tempted_train, mean_svd_train)

# train logistic regression classifier on training subjects

metauni <- unique(meta_table_sub[,c("studyid", "delivery")])
rownames(metauni) <- metauni$studyid
Atrain <- as.data.frame(res_tempted_train$A_hat)
Atrain$delivery <- metauni[rownames(Atrain),"delivery"]=="Cesarean"
glm_train <- glm(delivery ~ PC1+PC2,
                 data=Atrain, family=binomial(link="logit"))
summary(glm_train)

# predict the label of testing subject, whose true label is "Cesarean"

predict(glm_train, newdata=as.data.frame(sub_test), type="response")


[Package tempted version 0.1.1 Index]