sparsedc_cluster {SparseDC} | R Documentation |
Sparse Differential Clustering
Description
The main SparseDC function. This function clusters the samples from the two conditions and links the clusters across the conditions. It also identifies marker genes for each of the clusters. There are three types of marker gene which SparseDC identifies. Please see the original manuscript for further details.
Usage
sparsedc_cluster(pdat1, pdat2, ncluster, lambda1, lambda2, nitter = 20,
nstarts = 50, init_iter = 5)
Arguments
pdat1 |
The centered data from condition 1, columns should be samples (cells) and rows should be features (genes). |
pdat2 |
The centered data from condition 2, columns should be
samples (cells) and rows should be features (genes). The number of genes
should be the same as |
ncluster |
The number of clusters present in the data. |
lambda1 |
The lambda 1 value to use in the SparseDC function. This value controls the number of marker genes detected for each of the clusters in the final result. This can be calculated using the "lambda1_calculator" function or supplied by the user. |
lambda2 |
The lambda 2 value to use in the SparseDC function. This value controls the number of genes that show condition-dependent expression within each cell type. This can be calculated using the "lambda2_calculator" function or supplied by the user. |
nitter |
The max number of iterations for each of the start values, the default value is 20. |
nstarts |
The number of start values to use for SparseDC. The default value is 50. |
init_iter |
The number of iterations used to generate the starting center values. |
Value
A list containing the clustering solution, cluster centers and the score of each of the starts.
See Also
lambda1_calculator
lambda2_calculator
update_c
update_mu
Examples
set.seed(10)
# Select small dataset for example
data_test <- data_biase[1:100,]
# Split data into conditions 1 and 2
data_1 <- data_test[ , which(condition_biase == "A")]
data_2 <- data_test[ , which(condition_biase == "B")]
# Preprocess data (log transform and center)
pre_data <- pre_proc_data(data_1, data_2, norm = FALSE, log = TRUE,
center = TRUE)
# Calculate lambda 1 parameter
lambda1 <- lambda1_calculator(pdat1 = pre_data[[1]], pdat2 = pre_data[[2]],
ncluster=3, alpha1 = 0.5, nboot1 = 1000)
# Calculate lambda 2 parameter
lambda2 <- lambda2_calculator(pdat1 = pre_data[[1]], pdat2 = pre_data[[2]],
ncluster = 3, alpha2 = 0.5, nboot2 = 1000)
# Run sparse DC
sdc_res <- sparsedc_cluster(pdat1 = pre_data[[1]], pdat2 = pre_data[[2]], ncluster = 3,
lambda1 = lambda1, lambda2 = lambda2, nitter = 20, nstarts =50)
# Extract results
clusters_1 <- sdc_res$clusters1 # Clusters for condition 1 data
clusters_2 <- sdc_res$clusters2 # Clusters for condition 2 data
centers_1 <- sdc_res$centers1 # Centers for condition 1 data
centers_2 <- sdc_res$centers2 # Centers for condition 2 data
# View clusters
summary(as.factor(clusters_1))
summary(as.factor(clusters_2))
# View Marker genes
gene_names <- row.names(data_test)
m_gene_c1_up1 <- gene_names[which(centers_1[,1] > 0)]
m_gene_c1_up2 <- gene_names[which(centers_2[,1] > 0)]
m_gene_c1_down1 <- gene_names[which(centers_1[,1] < 0)]
m_gene_c1_down2 <- gene_names[which(centers_2[,1] < 0)]
m_gene_c2_cond <- gene_names[which(centers_1[,2] != centers_2[,2])]
# Can also run
pre_data <- pre_proc_data(data_1, data_2, norm = FALSE, log = TRUE,
center = TRUE)
pdata_A <- pre_data[[1]]
pdata_B <- pre_data[[2]]
lambda1 <- lambda1_calculator(pdat1 = pdata_A , pdat2 = pdata_B,
ncluster=3, alpha1 = 0.5, nboot1 = 1000)
lambda2 <- lambda2_calculator(pdat1 = pdata_A, pdat2 = pdata_B,
ncluster = 3, alpha2 = 0.5, nboot2 = 1000)
# Run sparse DC
sdc_res <- sparsedc_cluster(pdat1 = pdata_A, pdat2 = pdata_B, ncluster = 3,
lambda1 = lambda1, lambda2 = lambda2, nitter = 20, nstarts =50)