R: dcem_predict: Part of DCEM package.

dcem_predict {DCEM}

R Documentation

dcem_predict: Part of DCEM package.

Description

Predict the cluster membership of test data based on the learned parameters i.e, output from dcem_train or dcem_star_train.

Usage

dcem_predict(param_list, data)

Arguments

`param_list`	(list): List of distribution parameters. The list contains the learned parameteres of the distribution.
`data`	(vector or dataframe): A vector of data for univariate data. A dataframe (rows represent the data and columns represent the features) for multivariate data.

Value

A list containing the cluster membership for the test data.

References

Parichit Sharma, Hasan Kurban, Mehmet Dalkilic DCEM: An R package for clustering big data via data-centric modification of Expectation Maximization, SoftwareX, 17, 100944 URL https://doi.org/10.1016/j.softx.2021.100944

Examples

# Simulating a mixture of univariate samples from three distributions
# with meu as 20, 70 and 100 and standard deviation as 10, 100 and 40 respectively.
sample_uv_data = as.data.frame(c(rnorm(100, 20, 5), rnorm(70, 70, 1), rnorm(50, 100, 2)))

# Select first few points from each distribution as test data
test_data = as.vector(sample_uv_data[c(1:5, 101:105, 171:175),])

# Remove the test data from the training set
sample_uv_data = as.data.frame(sample_uv_data[-c(1:5, 101:105, 171:175), ])

# Randomly shuffle the samples.
sample_uv_data = as.data.frame(sample_uv_data[sample(nrow(sample_uv_data)),])

# Calling the dcem_train() function on the simulated data with threshold of
# 0.000001, iteration count of 1000 and random seeding respectively.
sample_uv_out = dcem_train(sample_uv_data, num_clusters = 3, iteration_count = 100,
threshold = 0.001)

# Predict the membership for test data
test_data_membership <- dcem_predict(sample_uv_out, test_data)

# Access the output
print(test_data_membership)

[Package DCEM version 2.0.5 Index]