R: Surrogate-guided ensemble Latent Dirichlet Allocation

sureLDA {sureLDA}

R Documentation

Surrogate-guided ensemble Latent Dirichlet Allocation

Description

Surrogate-guided ensemble Latent Dirichlet Allocation

Usage

sureLDA(
  X,
  ICD,
  NLP,
  HU,
  filter,
  prior = "PheNorm",
  weight = "beta",
  nEmpty = 20,
  alpha = 100,
  beta = 100,
  burnin = 50,
  ITER = 150,
  phi = NULL,
  nCores = 1,
  labeled = NULL,
  verbose = FALSE
)

Arguments

`X`	nPatients x nFeatures matrix of EHR feature counts
`ICD`	nPatients x nPhenotypes matrix of main ICD surrogate counts
`NLP`	nPatients x nPhenotypes matrix of main NLP surrogate counts
`HU`	nPatients-dimensional vector containing the healthcare utilization feature
`filter`	nPatients x nPhenotypes binary matrix indicating filter-positives
`prior`	'PheNorm', 'MAP', or nPatients x nPhenotypes matrix of prior probabilities (defaults to PheNorm)
`weight`	'beta', 'uniform', or nPhenotypes x nFeatures matrix of feature weights (defaults to beta)
`nEmpty`	Number of 'empty' topics to include in LDA step (defaults to 10)
`alpha`	LDA Dirichlet hyperparameter for patient-topic distribution (defaults to 100)
`beta`	LDA Dirichlet hyperparameter for topic-feature distribution (defaults to 100)
`burnin`	number of burnin Gibbs iterations (defaults to 50)
`ITER`	number of subsequent iterations for inference (defaults to 150)
`phi`	(optional) nPhenotypes x nFeatures pre-trained topic-feature distribution matrix
`nCores`	(optional) Number of parallel cores to use only if phi is provided (defaults to 1)
`labeled`	(optional) nPatients x nPhenotypes matrix of a priori labels (set missing entries to NA)
`verbose`	(optional) indicating whether to output verbose progress updates

Value

scores nPatients x nPhenotypes matrix of weighted patient-phenotype assignment counts from LDA step

probs nPatients x nPhenotypes matrix of patient-phenotype posterior probabilities

ensemble Mean of sureLDA posterior and PheNorm/MAP prior

prior nPatients x nPhenotypes matrix of PheNorm/MAP phenotype probability estimates

phi nPhenotypes x nFeatures topic distribution matrix from LDA step

weights nPhenotypes x nFeatures matrix of topic-feature weights

[Package sureLDA version 0.1.0-1 Index]