R: Run all major functions of tempted

tempted_all {tempted}

R Documentation

Run all major functions of tempted

Description

This function wraps functions format_tempted, svd_centralize, tempted, ratio_feature, \ and aggregate_feature.

Usage

tempted_all(
  featuretable,
  timepoint,
  subjectID,
  threshold = 0.95,
  pseudo = NULL,
  transform = "clr",
  r = 3,
  smooth = 1e-06,
  interval = NULL,
  resolution = 51,
  maxiter = 20,
  epsilon = 1e-04,
  r_svd = 1,
  do_ratio = TRUE,
  pct_ratio = 0.05,
  absolute = FALSE,
  pct_aggregate = 1,
  contrast = NULL
)

Arguments

`featuretable`	A sample by feature matrix. It is an input for `format_tempted`.
`timepoint`	The time stamp of each sample, matched with the rows of `featuretable`. It is an input for `format_tempted`.
`subjectID`	The subject ID of each sample, matched with the rows of `featuretable`. It is an input for `format_tempted`.
`threshold`	A threshold for feature filtering for microbiome data. Features with zero value percentage >= threshold will be excluded. Default is 0.95. It is an input for `format_tempted`.
`pseudo`	A small number to add to all the counts before normalizing into proportions and log transformation. Default is 1/2 of the smallest non-zero value that is specific for each sample. This pseudo count is added for `transform=c("logcomp", "clr", "logit")`. It is an input for `format_tempted`.
`transform`	The transformation applied to the data. `"logcomp"` for log of compositions. `"comp"` for compositions. `"ast"` for arcsine squared transformation. `"clr"` for central log ratio transformation. `"logit"` for logit transformation. `"none"` for no transformation. Default `transform="clr"` is recommended for microbiome data. For data that are already transformed, use `transform="none"`. It is an input for `format_tempted`.
`r`	Number of components to decompose into, i.e. rank of the CP type decomposition. Default is set to 3. It is an input for `tempted`.
`smooth`	Smoothing parameter for RKHS norm. Larger means smoother temporal loading functions. Default is set to be 1e-8. Value can be adjusted depending on the dataset by checking the smoothness of the estimated temporal loading function in plot. It is an input for `tempted`.
`interval`	The range of time points to ran the decomposition for. Default is set to be the range of all observed time points. User can set it to be a shorter interval than the observed range. It is an input for `tempted`.
`resolution`	Number of time points to evaluate the value of the temporal loading function. Default is set to 101. It does not affect the subject or feature loadings. It is an input for `tempted`.
`maxiter`	Maximum number of iteration. Default is 20. It is an input for `tempted`.
`epsilon`	Convergence criteria for difference between iterations. Default is 1e-4. It is an input for `tempted`.
`r_svd`	The number of ranks in the mean structure. Default is 1. It is an input for `svd_centralize`.
`do_ratio`	Whether to calculate the log ratio of features.
`pct_ratio`	The percent of features to sum up. Default is 0.05, i.e. 5%. It is an input for `ratio_feature`.
`absolute`	`absolute = TRUE` means features are ranked by the absolute value of feature loadings, and the top `pct_ratio` percent of features are picked. `absolute = FALSE` means features are ranked by the original value of feature loadings, and the top and bottom `pct_ratio` percent of features are picked. Then ratio is taken as the abundance of the features with positive loading over the abundance of the features with negative loading. It is an input for `ratio_feature`.
`pct_aggregate`	The percent of features to aggregate, features ranked by absolute value of the feature loading of each component. Default is 1, which means 100% of features are aggregated. Setting `pct_aggregate=0.01` means top 1% of features is aggregated, where features are ranked in absolute value of feature loading of each component. It is an input for `aggregate_feature`.
`contrast`	A matrix choosing how components are combined, each column is a contrast of length r and used to calculate the linear combination of the feature loadings of r components. It is an input for `ratio_feature` and It is an input for `aggregate_feature`.

Value

A list including all the input and output of functions format_tempted, svd_centralize, tempted, ratio_feature, and aggregate_feature.

input: All the input options of function tempted_all.
datalist_raw: Output of format_tempted with option transform="none".
datlist: Output of format_tempted.
mean_svd: Output of svd_centralize.
A_hat: Subject loading, a subject by r matrix.
B_hat: Feature loading, a feature by r matrix.
Phi_hat: Temporal loading function, a resolution by r matrix.
time_Phi: The time points where the temporal loading function is evaluated.
Lambda: Eigen value, a length r vector.
r_square: Variance explained by each component. This is the R-squared of the linear regression of the vectorized temporal tensor against the vectorized low-rank reconstruction using individual components.
accum_r_square: Variance explained by the first few components accumulated. This is the R-squared of the linear regression of the vectorized temporal tensor against the vectorized low-rank reconstruction using the first few components.
metafeature_ratio: The log ratio abundance of the top over bottom ranking features. It is a data.frame with five columns: "value" for the log ratio values, "subID" for the subject ID, and "timepoint" for the time points, and "PC" indicating which component was used to construct the meta feature.
toppct_ratio: A matrix of TRUE/FALSE indicating which features are ranked top in each component (and contrast) and used as the numerator of the log ratio.
bottompct_ratio: A matrix of TRUE/FALSE indicating which features are ranked bottom in each component (and contrast) and used as the denominator of the log ratio.
metafeature_aggregate: The meta feature obtained by aggregating the observed temporal tensor. It is a data.frame with four columns: "value" for the meta feature values, "subID" for the subject ID, "timepoint" for the time points, and "PC" indicating which component was used to construct the meta feature.
toppct_aggregate: A matrix of TRUE/FALSE indicating which features are aggregated in each component and contrast.
contrast: The contrast used to linearly combine the components from input.

References

Shi P, Martino C, Han R, Janssen S, Buck G, Serrano M, Owzar K, Knight R, Shenhav L, Zhang AR. (2023) Time-Informed Dimensionality Reduction for Longitudinal Microbiome Studies. bioRxiv. doi: 10.1101/550749. https://www.biorxiv.org/content/10.1101/550749.

Examples

# Take a subset of the samples so the example runs faster

# Here we are taking samples from the odd months
sub_sample <- rownames(meta_table)[(meta_table$day_of_life%/%12)%%2==1]
count_table_sub <- count_table[sub_sample,]
processed_table_sub <- processed_table[sub_sample,]
meta_table_sub <- meta_table[sub_sample,]

# for preprocessed data that do not need to be transformed


res.processed <- tempted_all(processed_table_sub,
                             meta_table_sub$day_of_life,
                            meta_table_sub$studyid,
                             threshold=1,
                             transform="none",
                             r=2,
                             smooth=1e-5,
                             do_ratio=FALSE)

# for count data that will have pseudo added and clr transformed

res.count <- tempted_all(count_table_sub,
                         meta_table_sub$day_of_life,
                         meta_table_sub$studyid,
                         threshold=0.95,
                         transform="clr",
                         pseudo=0.5,
                         r=2,
                         smooth=1e-5,
                         pct_ratio=0.1,
                         pct_aggregate=1)

# for proportional data that will have pseudo added and clr transformed

res.proportion <- tempted_all(count_table_sub/rowSums(count_table_sub),
                              meta_table_sub$day_of_life,
                              meta_table_sub$studyid,
                              threshold=0.95,
                              transform="clr",
                              pseudo=NULL,
                              r=2,
                              smooth=1e-5,
                              pct_ratio=0.1,
                              pct_aggregate=1)

# plot the temporal loading and subject trajectories grouped by delivery mode

plot_time_loading(res.proportion, r=2)

group <- unique(meta_table[,c("studyid", "delivery")])

# plot the aggregated features

plot_metafeature(res.proportion$metafeature_aggregate, group, bws=30)

[Package tempted version 0.1.1 Index]