dir_mult_GD {deepMOU} | R Documentation |

Performs parameter estimation by means of a Gradient Descend algorithm and cluster allocation for the Dirichlet-Multinomial mixture model.

```
dir_mult_GD(
x,
k,
n_it = 100,
eps = 1e-05,
seed_choice = 1,
KK = 20,
min_iter = 2,
init = NULL
)
```

`x` |
Document-term matrix describing the frequency of terms that occur in a collection of documents. Rows correspond to documents in the collection and columns correspond to terms. |

`k` |
Number of clusters/groups. |

`n_it` |
Number of Gradient Descend steps. |

`eps` |
Tolerance level for the convergence of the algorithm. Default is |

`seed_choice` |
Set seed for reproducible results. |

`KK` |
Maximum number of iterations allowed for the nlminb function (see below). |

`min_iter` |
Minimum number of Gradient Descend steps. |

`init` |
Vector containing the initial document allocations for the initialization of the algorithm. If NULL (default) initialization is carried out via spherical k-means (skmeans). |

Starting from the data given by `x`

the Dirichlet-Multinomial mixture model is fitted
and `k`

clusters are obtained.
The algorithm for the parameter estimation is the Gradiend Descend.
In particular, the function assigns initial values to weights of the Dirichlet-Multinomial distribution for each cluster
and inital weights for the elements of the mixture. The estimates are obtained with maximum `n_it`

steps of the
Descent Algorithm algorithm or until a tolerance level `eps`

is reached; by using the posterior distribution
of the latent variable z, the documents are allocated to the cluster which maximizes the
posterior distribution.
For further details see the references.

A list containing the following elements:

`x` |
The data matrix. |

`clusters` |
the clustering labels. |

`k` |
the number of clusters. |

`numobs` |
the sample size. |

`p` |
the vocabulary size. |

`likelihood` |
vector containing the likelihood values at each iteration. |

`pi_hat` |
estimated probabilities of belonging to the |

`Theta` |
matrix containing the estimates of the Theta parameters for each cluster. |

`f_z_x` |
matrix containing the posterior probabilities of belonging to each cluster. |

`AIC` |
Akaike Information Criterion (AIC) value of the fitted model. |

`BIC` |
Bayesian Information Criterion (BIC) value of the fitted model. |

Anderlucci L, Viroli C (2020). "Mixtures of Dirichlet-Multinomial distributions for supervised and unsupervised classification of short text data." *Advances in Data Analysis and Classification*, **14**, 759-770. doi: 10.1007/s11634-020-00399-3.

```
# Load the CNAE2 dataset
data("CNAE2")
# Perform parameter estimation and clustering, very
# few iterations are used for this example
dir_CNAE2 = dir_mult_GD(x = CNAE2, k = 2, n_it = 2)
# Shows cluster labels to documents
dir_CNAE2$clusters
```

[Package *deepMOU* version 0.1.1 Index]