gof {TDLM} | R Documentation |
Compute goodness-of-fit measures between observed and simulated OD matrices
Description
This function returns a data.frame where each row provides one or several goodness-of-fit measures between a simulated and an observed Origin-Destination matrix.
Usage
gof(
sim,
obs,
measures = "all",
distance = NULL,
bin_size = 2,
use_proba = FALSE,
check_names = FALSE
)
Arguments
sim |
an object of class |
obs |
a squared matrix representing the observed mobility flows. |
measures |
a vector of string(s) indicating which goodness-of-fit
measure(s) to chose (see Details). If |
distance |
a squared matrix representing the distance between locations. Only necessary for the distance-based measures. |
bin_size |
a numeric value indicating the size of bin used to discretize the distance distribution to compute CPC_d (2 "km" by default). |
use_proba |
a boolean indicating if the |
check_names |
a boolean indicating if the ID location are used as matrix rownames and colnames and if they should be checked (see Note). |
Details
With \(n\) the number of locations, \(T_{ij}\) the
observed flow between location \(i\) and location \(j\)
(argument obs
), \(\tilde{T}_{ij}\) a simulated flow
between location \(i\) and location \(j\) (a matrix from
argument sim
), \(N=\sum_{i,j=1}^n T_{ij}\) the
sum of observed flows and
\(\tilde{N}=\sum_{i,j=1}^n \tilde{T}_{ij}\)
the sum of simulated flows.
Several goodness-of-fit measures have been considered
measures = c("CPC", "NRMSE", "KL", "CPL", "CPC_d", "KS")
. The Common Part
of Commuters (Gargiulo et al. 2012; Lenormand et al. 2012; Lenormand et al. 2016),
\(\displaystyle CPC(T,\tilde{T}) = \frac{2\cdot\sum_{i,j=1}^n min(T_{ij},\tilde{T}_{ij})}{N + \tilde{N}}\)
the Normalized Root Mean Square Error (NRMSE),
\(\displaystyle NRMSE(T,\tilde{T}) = \sqrt{\frac{\sum_{i,j=1}^n (T_{ij}-\tilde{T}_{ij})^2}{N}}\)
the Kullback–Leibler divergence (Kullback and Leibler 1951),
\(\displaystyle KL(T,\tilde{T}) = \sum_{i,j=1}^n \frac{T_{ij}}{N}\log\left(\frac{T_{ij}}{N}\frac{\tilde{N}}{\tilde{T}_{ij}}\right)\)
the Common Part of Links (CPL) (Lenormand et al. 2016),
\(\displaystyle CPL(T,\tilde{T}) = \frac{2\cdot\sum_{i,j=1}^n 1_{T_{ij}>0} \cdot 1_{\tilde{T}_{ij}>0}}{\sum_{i,j=1}^n 1_{T_{ij}>0} + \sum_{i,j=1}^n 1_{\tilde{T}_{ij}>0}}\)
the Common Part of Commuters based on the disance
(Lenormand et al. 2016), noted CPC_d. Let us consider
\(N_k\) (and \(\tilde{N}_k\)) the
sum of observed (and simulated) flows at a distance comprised in the bin
[bin_size
*k-bin_size
, bin_size
*k[.
\(\displaystyle CPC_d(T,\tilde{T}) = \frac{2\cdot\sum_{k=1}^{\infty} min(N_{k},\tilde{N}_{k})}{N+\tilde{N}}\)
and the Kolmogorv-Smirnov statistic and p-value (Massey 1951) , noted KS. It is based on the observed and simulated flow distance distribution and computed with the ks_test function from the Ecume package.
Value
A data.frame providing one or several goodness-of-fit measure(s) between simulated OD(s) and an observed OD. Each row corresponds to a matrix sorted according to the list (or list of list) elements (names are used if provided).
Note
By default, if sim
is an output of run_law_model()
the measure(s) are computed only for the simulated OD matrices and
not the proba
matrix (included in the output when
write_proba = TRUE
). The argument use_proba
can be used to compute the
measure(s) based on the proba
matrix instead of the simulated
OD matrix. In this case the argument obs
should also be a proba matrix.
All the inputs should be based on the same number of
locations sorted in the same order. It is recommended to use the location ID
as matrix rownames and matrix colnames and to set
check_names = TRUE
to verify that everything is in order before running
this function (check_names = FALSE
by default). Note that the function
check_format_names()
can be used to control the validity of all the inputs
before running the main package's functions.
Author(s)
Maxime Lenormand (maxime.lenormand@inrae.fr)
References
Lenormand M, Bassolas A, Ramasco JJ (2016). “Systematic comparison of trip distribution laws and models.” Journal of Transport Geography, 51, 158-169.
Gargiulo F, Lenormand M, Huet S, Baqueiro Espinosa O (2012). “Commuting network model: getting to the essentials.” Journal of Artificial Societies and Social Simulation, 15(2), 13.
Lenormand M, Huet S, Gargiulo F, Deffuant G (2012). “A Universal Model of Commuting Networks.” PLoS ONE, 7, e45985.
Kullback S, Leibler RA (1951). “On Information and Sufficiency.” The Annals of Mathematical Statistics, 22(1), 79 – 86.
Massey FJ (1951). “The Kolmogorov-Smirnov test for goodness of fit.” Journal of the American Statistical Association, 46(253), 68–78.
See Also
run_law_model()
run_law()
run_model()
run_law_model()
check_format_names()
Examples
data(mass)
data(distance)
data(od)
mi <- as.numeric(mass[, 1])
mj <- mi
Oi <- as.numeric(mass[, 2])
Dj <- as.numeric(mass[, 3])
res <- run_law_model(
law = "GravExp", mass_origin = mi, mass_destination = mj,
distance = distance, opportunity = NULL, param = 0.01,
model = "DCM", nb_trips = NULL, out_trips = Oi, in_trips = Dj,
average = FALSE, nbrep = 1, maxiter = 50, mindiff = 0.01,
write_proba = FALSE,
check_names = FALSE
)
gof(
sim = res, obs = od, measures = "CPC", distance = NULL, bin_size = 2,
use_proba = FALSE,
check_names = FALSE
)