make.heldout {stm} | R Documentation |
Heldout Likelihood by Document Completion
Description
Tools for making and evaluating heldout datasets.
Usage
make.heldout(
documents,
vocab,
N = floor(0.1 * length(documents)),
proportion = 0.5,
seed = NULL
)
Arguments
documents |
the documents to be modeled (see |
vocab |
the vocabulary item |
N |
number of docs to be partially held out |
proportion |
proportion of docs to be held out. |
seed |
the seed, set for replicability |
Details
These functions are used to create and evaluate heldout likelihood using the document completion method. The basic idea is to hold out some fraction of the words in a set of documents, train the model and use the document-level latent variables to evaluate the probability of the heldout portion. See the example for the basic workflow.
Examples
prep <- prepDocuments(poliblog5k.docs, poliblog5k.voc,
poliblog5k.meta,subsample=500,
lower.thresh=20,upper.thresh=200)
heldout <- make.heldout(prep$documents, prep$vocab)
documents <- heldout$documents
vocab <- heldout$vocab
meta <- prep$meta
stm1<- stm(documents, vocab, 5,
prevalence =~ rating+ s(day),
init.type="Random",
data=meta, max.em.its=5)
eval.heldout(stm1, heldout$missing)
[Package stm version 1.3.7 Index]