tempted_all {tempted}R Documentation

Run all major functions of tempted

Description

This function wraps functions format_tempted, svd_centralize, tempted, ratio_feature, \ and aggregate_feature.

Usage

tempted_all(
  featuretable,
  timepoint,
  subjectID,
  threshold = 0.95,
  pseudo = NULL,
  transform = "clr",
  r = 3,
  smooth = 1e-06,
  interval = NULL,
  resolution = 51,
  maxiter = 20,
  epsilon = 1e-04,
  r_svd = 1,
  do_ratio = TRUE,
  pct_ratio = 0.05,
  absolute = FALSE,
  pct_aggregate = 1,
  contrast = NULL
)

Arguments

featuretable

A sample by feature matrix. It is an input for format_tempted.

timepoint

The time stamp of each sample, matched with the rows of featuretable. It is an input for format_tempted.

subjectID

The subject ID of each sample, matched with the rows of featuretable. It is an input for format_tempted.

threshold

A threshold for feature filtering for microbiome data. Features with zero value percentage >= threshold will be excluded. Default is 0.95. It is an input for format_tempted.

pseudo

A small number to add to all the counts before normalizing into proportions and log transformation. Default is 1/2 of the smallest non-zero value that is specific for each sample. This pseudo count is added for transform=c("logcomp", "clr", "logit"). It is an input for format_tempted.

transform

The transformation applied to the data. "logcomp" for log of compositions. "comp" for compositions. "ast" for arcsine squared transformation. "clr" for central log ratio transformation. "logit" for logit transformation. "none" for no transformation. Default transform="clr" is recommended for microbiome data. For data that are already transformed, use transform="none". It is an input for format_tempted.

r

Number of components to decompose into, i.e. rank of the CP type decomposition. Default is set to 3. It is an input for tempted.

smooth

Smoothing parameter for RKHS norm. Larger means smoother temporal loading functions. Default is set to be 1e-8. Value can be adjusted depending on the dataset by checking the smoothness of the estimated temporal loading function in plot. It is an input for tempted.

interval

The range of time points to ran the decomposition for. Default is set to be the range of all observed time points. User can set it to be a shorter interval than the observed range. It is an input for tempted.

resolution

Number of time points to evaluate the value of the temporal loading function. Default is set to 101. It does not affect the subject or feature loadings. It is an input for tempted.

maxiter

Maximum number of iteration. Default is 20. It is an input for tempted.

epsilon

Convergence criteria for difference between iterations. Default is 1e-4. It is an input for tempted.

r_svd

The number of ranks in the mean structure. Default is 1. It is an input for svd_centralize.

do_ratio

Whether to calculate the log ratio of features.

pct_ratio

The percent of features to sum up. Default is 0.05, i.e. 5%. It is an input for ratio_feature.

absolute

absolute = TRUE means features are ranked by the absolute value of feature loadings, and the top pct_ratio percent of features are picked. absolute = FALSE means features are ranked by the original value of feature loadings, and the top and bottom pct_ratio percent of features are picked. Then ratio is taken as the abundance of the features with positive loading over the abundance of the features with negative loading. It is an input for ratio_feature.

pct_aggregate

The percent of features to aggregate, features ranked by absolute value of the feature loading of each component. Default is 1, which means 100% of features are aggregated. Setting pct_aggregate=0.01 means top 1% of features is aggregated, where features are ranked in absolute value of feature loading of each component. It is an input for aggregate_feature.

contrast

A matrix choosing how components are combined, each column is a contrast of length r and used to calculate the linear combination of the feature loadings of r components. It is an input for ratio_feature and It is an input for aggregate_feature.

Value

A list including all the input and output of functions format_tempted, svd_centralize, tempted, ratio_feature, and aggregate_feature.

input

All the input options of function tempted_all.

datalist_raw

Output of format_tempted with option transform="none".

datlist

Output of format_tempted.

mean_svd

Output of svd_centralize.

A_hat

Subject loading, a subject by r matrix.

B_hat

Feature loading, a feature by r matrix.

Phi_hat

Temporal loading function, a resolution by r matrix.

time_Phi

The time points where the temporal loading function is evaluated.

Lambda

Eigen value, a length r vector.

r_square

Variance explained by each component. This is the R-squared of the linear regression of the vectorized temporal tensor against the vectorized low-rank reconstruction using individual components.

accum_r_square

Variance explained by the first few components accumulated. This is the R-squared of the linear regression of the vectorized temporal tensor against the vectorized low-rank reconstruction using the first few components.

metafeature_ratio

The log ratio abundance of the top over bottom ranking features. It is a data.frame with five columns: "value" for the log ratio values, "subID" for the subject ID, and "timepoint" for the time points, and "PC" indicating which component was used to construct the meta feature.

toppct_ratio

A matrix of TRUE/FALSE indicating which features are ranked top in each component (and contrast) and used as the numerator of the log ratio.

bottompct_ratio

A matrix of TRUE/FALSE indicating which features are ranked bottom in each component (and contrast) and used as the denominator of the log ratio.

metafeature_aggregate

The meta feature obtained by aggregating the observed temporal tensor. It is a data.frame with four columns: "value" for the meta feature values, "subID" for the subject ID, "timepoint" for the time points, and "PC" indicating which component was used to construct the meta feature.

toppct_aggregate

A matrix of TRUE/FALSE indicating which features are aggregated in each component and contrast.

contrast

The contrast used to linearly combine the components from input.

References

Shi P, Martino C, Han R, Janssen S, Buck G, Serrano M, Owzar K, Knight R, Shenhav L, Zhang AR. (2023) Time-Informed Dimensionality Reduction for Longitudinal Microbiome Studies. bioRxiv. doi: 10.1101/550749. https://www.biorxiv.org/content/10.1101/550749.

Examples

# Take a subset of the samples so the example runs faster

# Here we are taking samples from the odd months
sub_sample <- rownames(meta_table)[(meta_table$day_of_life%/%12)%%2==1]
count_table_sub <- count_table[sub_sample,]
processed_table_sub <- processed_table[sub_sample,]
meta_table_sub <- meta_table[sub_sample,]

# for preprocessed data that do not need to be transformed


res.processed <- tempted_all(processed_table_sub,
                             meta_table_sub$day_of_life,
                            meta_table_sub$studyid,
                             threshold=1,
                             transform="none",
                             r=2,
                             smooth=1e-5,
                             do_ratio=FALSE)

# for count data that will have pseudo added and clr transformed

res.count <- tempted_all(count_table_sub,
                         meta_table_sub$day_of_life,
                         meta_table_sub$studyid,
                         threshold=0.95,
                         transform="clr",
                         pseudo=0.5,
                         r=2,
                         smooth=1e-5,
                         pct_ratio=0.1,
                         pct_aggregate=1)

# for proportional data that will have pseudo added and clr transformed

res.proportion <- tempted_all(count_table_sub/rowSums(count_table_sub),
                              meta_table_sub$day_of_life,
                              meta_table_sub$studyid,
                              threshold=0.95,
                              transform="clr",
                              pseudo=NULL,
                              r=2,
                              smooth=1e-5,
                              pct_ratio=0.1,
                              pct_aggregate=1)

# plot the temporal loading and subject trajectories grouped by delivery mode

plot_time_loading(res.proportion, r=2)

group <- unique(meta_table[,c("studyid", "delivery")])

# plot the aggregated features

plot_metafeature(res.proportion$metafeature_aggregate, group, bws=30)


[Package tempted version 0.1.1 Index]