TWLsample {twl} | R Documentation |
Main function to obtain posterior samples from a TWL model.
Description
Main function to obtain posterior samples from a TWL model.
Usage
TWLsample(full_dat_mat, full_dat, alpha_re = 7, beta_re = 0.4,
num_its = 5000, num_all_clus = 30, output_every = 20, manip = TRUE,
sav_inter = FALSE)
Arguments
full_dat_mat |
list of matrices of the different data types. |
full_dat |
list of data.tables with a single column labelled 'nam', denoting sample annotation. A consistent naming convention of samples must be used across data types. |
alpha_re |
Hyperparameter for the dirichlet prior model within each data type, influencing sparsity of clusterings. A smaller number encourages fewer clusters. Defaults to 7 and should be chosen as a function of sample size. |
beta_re |
Hyperparameter for the dirichlet prior model across datatypes within each sample, influencing the degree to which each data type's sample cluster labels affect those of the other data types. Defaults to 0.4 and should be chosen as a function of the total number of data types being integrated in the analysis. |
num_its |
Number of iterations. Defaults to 5000. |
num_all_clus |
Ceiling on the number of clusters. Defaults to 30. Should be chosen as some factor greater (for example, 5), than maximum number of hypothesized clusters in the data types. |
output_every |
Frequency of sampling log statistics, reporting mixing, cluster distribution, and proportion of cluster sharing across data types. Defaults to once every 20 iterations. |
manip |
TRUE/FALSE for whether likelihood manipulation should be used to increase mixing in situations where cluster means are far from one another in Euclidean distance. This should not influence identified clusters nor parameters associated with them. Defaults to TRUE. |
sav_inter |
A logical indicating whether a temporary file of the samples should be written out in the working directory every 50 iterations. Allows for restarts when sampling is interrupted, and defaults to FALSE. |
Value
A list of lists of data.tables. The list length is the number of iterations. The length of each element is the number of data types. The data.tables have 2 columns, sample annotation called ‘nam’ and cluster assignment called 'clus'.
Examples
data(data_and_output)
## Not run: clus_save <- TWLsample(misaligned_mat,misaligned,output_every=50,num_its=5000,manip=FALSE)
outpu_new <- pairwise_clus(clus_save,BURNIN=2000)
## End(Not run)
post_analy_cor(outpu_new,c("title1","title2","title3","title4","title5"),
tempfile(),ords='none')
clus_labs <- post_analy_clus(outpu_new,clus_save,c(2:6),rep(0.6,5),c("title1","title2",
"title3","title4","title5"),tempfile())
output_nest <- cross_dat_analy(clus_save,4900)