gof {TDLM}R Documentation

Compute goodness-of-fit measures between observed and simulated OD matrices

Description

This function returns a data.frame where each row provides one or several goodness-of-fit measures between a simulated and an observed Origin-Destination matrix.

Usage

gof(
  sim,
  obs,
  measures = "all",
  distance = NULL,
  bin_size = 2,
  use_proba = FALSE,
  check_names = FALSE
)

Arguments

sim

an object of class TDLM (output of run_law_model(), run_law() or run_model()). A matrix or a list of matrices can also be used (see Note).

obs

a squared matrix representing the observed mobility flows.

measures

a vector of string(s) indicating which goodness-of-fit measure(s) to chose (see Details). If "all" is specified, then all measures will be calculated.

distance

a squared matrix representing the distance between locations. Only necessary for the distance-based measures.

bin_size

a numeric value indicating the size of bin used to discretize the distance distribution to compute CPC_d (2 "km" by default).

use_proba

a boolean indicating if the proba matrix should be used instead of the simulated OD matrix to compute the measure(s). Only valid for the output from run_law_model() with argument write_proba = TRUE (see Note).

check_names

a boolean indicating if the ID location are used as matrix rownames and colnames and if they should be checked (see Note).

Details

With \(n\) the number of locations, \(T_{ij}\) the observed flow between location \(i\) and location \(j\) (argument obs), \(\tilde{T}_{ij}\) a simulated flow between location \(i\) and location \(j\) (a matrix from argument sim), \(N=\sum_{i,j=1}^n T_{ij}\) the sum of observed flows and \(\tilde{N}=\sum_{i,j=1}^n \tilde{T}_{ij}\) the sum of simulated flows.

Several goodness-of-fit measures have been considered measures = c("CPC", "NRMSE", "KL", "CPL", "CPC_d", "KS"). The Common Part of Commuters (Gargiulo et al. 2012; Lenormand et al. 2012; Lenormand et al. 2016),

\(\displaystyle CPC(T,\tilde{T}) = \frac{2\cdot\sum_{i,j=1}^n min(T_{ij},\tilde{T}_{ij})}{N + \tilde{N}}\)

the Normalized Root Mean Square Error (NRMSE),

\(\displaystyle NRMSE(T,\tilde{T}) = \sqrt{\frac{\sum_{i,j=1}^n (T_{ij}-\tilde{T}_{ij})^2}{N}}\)

the Kullback–Leibler divergence (Kullback and Leibler 1951),

\(\displaystyle KL(T,\tilde{T}) = \sum_{i,j=1}^n \frac{T_{ij}}{N}\log\left(\frac{T_{ij}}{N}\frac{\tilde{N}}{\tilde{T}_{ij}}\right)\)

the Common Part of Links (CPL) (Lenormand et al. 2016),

\(\displaystyle CPL(T,\tilde{T}) = \frac{2\cdot\sum_{i,j=1}^n 1_{T_{ij}>0} \cdot 1_{\tilde{T}_{ij}>0}}{\sum_{i,j=1}^n 1_{T_{ij}>0} + \sum_{i,j=1}^n 1_{\tilde{T}_{ij}>0}}\)

the Common Part of Commuters based on the disance (Lenormand et al. 2016), noted CPC_d. Let us consider \(N_k\) (and \(\tilde{N}_k\)) the sum of observed (and simulated) flows at a distance comprised in the bin [bin_size*k-bin_size, bin_size*k[.

\(\displaystyle CPC_d(T,\tilde{T}) = \frac{2\cdot\sum_{k=1}^{\infty} min(N_{k},\tilde{N}_{k})}{N+\tilde{N}}\)

and the Kolmogorv-Smirnov statistic and p-value (Massey 1951) , noted KS. It is based on the observed and simulated flow distance distribution and computed with the ks_test function from the Ecume package.

Value

A data.frame providing one or several goodness-of-fit measure(s) between simulated OD(s) and an observed OD. Each row corresponds to a matrix sorted according to the list (or list of list) elements (names are used if provided).

Note

By default, if sim is an output of run_law_model() the measure(s) are computed only for the simulated OD matrices and not the proba matrix (included in the output when write_proba = TRUE). The argument use_proba can be used to compute the measure(s) based on the proba matrix instead of the simulated OD matrix. In this case the argument obs should also be a proba matrix.

All the inputs should be based on the same number of locations sorted in the same order. It is recommended to use the location ID as matrix rownames and matrix colnames and to set check_names = TRUE to verify that everything is in order before running this function (check_names = FALSE by default). Note that the function check_format_names() can be used to control the validity of all the inputs before running the main package's functions.

Author(s)

Maxime Lenormand (maxime.lenormand@inrae.fr)

References

Lenormand M, Bassolas A, Ramasco JJ (2016). “Systematic comparison of trip distribution laws and models.” Journal of Transport Geography, 51, 158-169.

Gargiulo F, Lenormand M, Huet S, Baqueiro Espinosa O (2012). “Commuting network model: getting to the essentials.” Journal of Artificial Societies and Social Simulation, 15(2), 13.

Lenormand M, Huet S, Gargiulo F, Deffuant G (2012). “A Universal Model of Commuting Networks.” PLoS ONE, 7, e45985.

Kullback S, Leibler RA (1951). “On Information and Sufficiency.” The Annals of Mathematical Statistics, 22(1), 79 – 86.

Massey FJ (1951). “The Kolmogorov-Smirnov test for goodness of fit.” Journal of the American Statistical Association, 46(253), 68–78.

See Also

run_law_model() run_law() run_model() run_law_model() check_format_names()

Examples

data(mass)
data(distance)
data(od)

mi <- as.numeric(mass[, 1])
mj <- mi
Oi <- as.numeric(mass[, 2])
Dj <- as.numeric(mass[, 3])

res <- run_law_model(
  law = "GravExp", mass_origin = mi, mass_destination = mj,
  distance = distance, opportunity = NULL, param = 0.01,
  model = "DCM", nb_trips = NULL, out_trips = Oi, in_trips = Dj,
  average = FALSE, nbrep = 1, maxiter = 50, mindiff = 0.01,
  write_proba = FALSE,
  check_names = FALSE
)

gof(
  sim = res, obs = od, measures = "CPC", distance = NULL, bin_size = 2,
  use_proba = FALSE,
  check_names = FALSE
)



[Package TDLM version 1.0.0 Index]