get.fit {Karen}R Documentation

Fit the state-space model to a clonal tracking dataset

Description

This function fits a state-space model to a clonal tracking dataset using an extended Kalman filter approach.

Usage

get.fit(
  rct.lst,
  constr.lst = NULL,
  latSts.lst,
  ct.lst,
  Y,
  m0,
  P0,
  cl = getDefaultCluster(),
  control = list(nLQR = 3, lmm = 25, pgtol = 0, relErrfct = 1e-05, tol = 1e-09, maxit =
    1000, maxitEM = 10, trace = 1, verbose = TRUE, FORCEP = TRUE)
)

Arguments

rct.lst

A list of biochemical reactions defining the cell differentiation network. A differentiation move from cell type "A" to cell type "B" must be coded as "A->B" Duplication of cell "A" must be coded as "A->1" Death of cell "A" must be coded as "A->0".

constr.lst

(defaults to NULL, when no constraints are needed) List of linear constraints that must be applied to the biochemical reactions. For example, if we need the constraint "A->B = B->C + B->D", this must be coded using the following syntax c("theta\[\'A->B\'\]=(theta\[\'B->C\'\] + theta\[\'B->D\'\])").

latSts.lst

List of the latent cell types. If for example counts are not available for cell types "A" and "B", then latSts.lst = c("A", "B").

ct.lst

List of all the cell types involved in the network formulation. For example, if the network is defined by the biochemical reactions are A->B" and "A->C", then ct.lst = c("A", "B", "C").

Y

A 3-dimensional array whose dimensions are the time, the cell type and the clone respectively.

m0

mean vector of the initial condition x_0

P0

covariance matrix of the initial condition x_0

cl

An object of class "cluster" specifying the cluster to be used for parallel execution. See makeCluster for more information. If the argument is not specified, the default cluster is used. See setDefaultCluster for information on how to set up a default cluster.

control

A a list of control parameters for the optimization routine:

  • "nLQR"(defaults to 3) is an integer giving the order of the Gauss-Legendre approximation for integrals.

  • "lmm"(defaults to 25) is an integer giving the number of BFGS updates retained in the "L-BFGS-B" method.

  • "pgtol"(defaults to 0 when check is suppressed) is a tolerance on the projected gradient in the current search direction of the "L-BFGS-B" method.

  • "relErrfct"(defaults to 1e-5) is the relative error on the function value for the "L-BFGS-B" optimization. That is, the parameter "factr" of the optim() function is set to relErrfct/.Machine$double.eps.

  • "tol"(defaults to 1e-9) is the relative error tolerance for the expectation-maximization algorithm of the extended Kalman filter optimization. That is, the optimization is run until the relative error of the function and of the parameter vector are lower than tol.

  • "maxit"(defaults to 1000) The maximum number of iterations for the "L-BFGS-B" optimization.

  • "maxitEM"(defaults to 10) The maximum number of iterations for the expectation-maximization algorithm.

  • "trace"(defaults to 1) Non-negative integer. If positive, tracing information on the progress of the optimization is produced. This parameter is also passed to the optim() function. Higher values may produce more tracing information: for method "L-BFGS-B" there are six levels of tracing. (To understand exactly what these do see the source code: higher levels give more detail.)

  • "verbose"(defaults to TRUE) Logical value. If TRUE, then information messages on the progress of the filtering/smoothing algorithm are printed to the console.

  • "FORCEP"(defaults to TRUE) Logical value. If TRUE, then all the covariance matrices involved in the algorithm are forced to be positive-definite and it helps the convergence of the optimization.

Value

A list containing the following:

Examples

rcts <- c("HSC->T", ## reactions
          "HSC->M",
          "T->0",
          "M->0")

cnstr <- c("theta\\[\\'HSC->T\\'\\]=(theta\\[\\'T->0\\'\\])",
           "theta\\[\\'HSC->M\\'\\]=(theta\\[\\'M->0\\'\\])")
latsts <- "HSC" ## latent cell types

ctps <- unique(setdiff(c(sapply(rcts, function(r){ ## all cell types
  as.vector(unlist(strsplit(r, split = "->", fixed = TRUE)))
}, simplify = "array")), c("0", "1")))



Y0 <- Y_CT$WAS[,setdiff(ctps,"HSC"),]
topClones <- 2
Y0 <- Y0[,,names(head(sort(apply(Y0!=0, 3, sum), decreasing = TRUE), topClones)),drop=FALSE]


## cluster parameters:
cl <- parallel::makeCluster(2, type = "PSOCK")

## initial condition:
X0 <- rep(0, length(ctps))
names(X0) <- ctps
X0["HSC"] <- 1

## mean vector and covariance matrix of X0:
m_0 <- replicate(dim(Y0)[3], X0, simplify = "array")
colnames(m_0) <- dimnames(Y0)[[3]]
P_0 <- Matrix::Diagonal(length(ctps) * dim(Y0)[3], 10)
rownames(P_0) <- colnames(P_0) <- rep(dimnames(Y0)[[3]], each = length(ctps))

## fit Karen on data:
res.fit <- get.fit(rct.lst = rcts,
                   constr.lst = cnstr,
                   latSts.lst = latsts,
                   ct.lst = ctps,
                   Y = Y0,
                   m0 = m_0,
                   P0 = P_0,
                   cl = cl,
                   list(nLQR = 1,
                        lmm = 0, ## needs to be >=5 for real applications
                        pgtol = 0,
                        relErrfct = 1e-5,
                        tol = 1e-3,
                        maxit = 0, ## needs to be increased for real applications
                        maxitEM = 1, ## needs to be increased for real applications
                        trace = 1,
                        verbose = TRUE,
                        FORCEP = FALSE))

[Package Karen version 1.0 Index]