prep_cpts {LDATS}R Documentation

Initialize and update the change point matrix used in the ptMCMC algorithm

Description

Each of the chains is initialized by prep_cpts using a draw from the available times (i.e. assuming a uniform prior), the best fit (by likelihood) draw is put in the focal chain with each subsequently worse fit placed into the subsequently hotter chain. update_cpts updates the change points after every iteration in the ptMCMC algorithm.

Usage

prep_cpts(data, formula, nchangepoints, timename, weights, control = list())

update_cpts(cpts, swaps)

Arguments

data

data.frame including [1] the time variable (indicated in timename), [2] the predictor variables (required by formula) and [3], the multinomial response variable (indicated in formula) as verified by check_timename and check_formula. Note that the response variables should be formatted as a data.frame object named as indicated by the response entry in the control list, such as gamma for a standard TS analysis on LDA output.

formula

formula defining the regression relationship between the change points, see formula. Any predictor variable included must also be a column in data and any (multinomial) response variable must be a set of columns in data, as verified by check_formula.

nchangepoints

integer corresponding to the number of change points to include in the model. 0 is a valid input (corresponding to no change points, so a singular time series model), and the current implementation can reasonably include up to 6 change points. The number of change points is used to dictate the segmentation of the data for each continuous model and each LDA model.

timename

character element indicating the time variable used in the time series. Defaults to "time". The variable must be integer-conformable or a Date. If the variable named is a Date, the input is converted to an integer, resulting in the timestep being 1 day, which is often not desired behavior.

weights

Optional class numeric vector of weights for each document. Defaults to NULL, translating to an equal weight for each document. When using multinom_TS in a standard LDATS analysis, it is advisable to weight the documents by their total size, as the result of LDA is a matrix of proportions, which does not account for size differences among documents. For most models, a scaling of the weights (so that the average is 1) is most appropriate, and this is accomplished using document_weights.

control

A list of parameters to control the fitting of the Time Series model including the parallel tempering Markov Chain Monte Carlo (ptMCMC) controls. Values not input assume defaults set by TS_control.

cpts

The existing matrix of change points.

swaps

Chain configuration after among-temperature swaps.

Value

list of [1] matrix of change points (rows) for each temperature (columns) and [2] vector of log-likelihood values for each of the chains.

Examples


  data(rodents)
  document_term_table <- rodents$document_term_table
  document_covariate_table <- rodents$document_covariate_table
  LDA_models <- LDA_set(document_term_table, topics = 2)[[1]]
  data <- document_covariate_table
  data$gamma <- LDA_models@gamma
  weights <- document_weights(document_term_table)
  data <- data[order(data[,"newmoon"]), ]
  saves <- prep_saves(1, TS_control())
  inputs <- prep_ptMCMC_inputs(data, gamma ~ 1, 1, "newmoon", weights,
                               TS_control())
  cpts <- prep_cpts(data, gamma ~ 1, 1, "newmoon", weights, TS_control())
  ids <- prep_ids(TS_control())
  for(i in 1:TS_control()$nit){
    steps <- step_chains(i, cpts, inputs)
    swaps <- swap_chains(steps, inputs, ids)
    saves <- update_saves(i, saves, steps, swaps)
    cpts <- update_cpts(cpts, swaps)
    ids <- update_ids(ids, swaps)
  }



[Package LDATS version 0.3.0 Index]