find_pbm_diff {diffman}R Documentation

Perform all the process to detect risky observations


Allow from a table of observations for which there are two different nomenclatures (z1 and z2) to determine the observations at risk when using the differentiation technique


  save_file = NULL,
  simplify = TRUE,
  verbose = TRUE



The table of observations (data.frame or data.table). Each row correspond to an observtion and for each observation we must know in which category of the z1 nomenclature it belongs and in which category of the z2 nomenclature.


Strictly positive integer indicating the confidentiality threshold. Observations are considered at risk if one can deduce information on a agregate of n observations where n < threshold.


Integer indicating the maximal size of agregates which are tested exhaustively. If that number is too large (greater than 30), the computations may not end because of the combinations number that can become very large. Also the RAM can be overloaded.


Character indicating the suffix of the name of the saved results. If is null, results are not writing on the hardware. The path root is taken from the working directory (getwd()).


Boolean. If TRUE then the graph simplification (merging + splitting) occures. Otherwise the exhaustive search is directly applied on the original graph.


Boolean. If TRUE (default), the different steps of the process are notified and progress bars provide an estimation of time left.


Risky observations because of differentiation are the ones for which information can be deduced on agregates smaller than the confidentiality threshold. For example, considering the confidentiality threshold is 10 and if by making the difference between some categories of z1 and some categories of z2 one can deduce the value of a variable for 5 observations, then those 5 observations are considered as "risky".


As an output there is a data.table or data.frame with five columns :

  1. $id_obs for the observation at risk

  2. $agregat for the agregate of categories from z1 nomenclature on which the differentiation is performed

  3. $agregat_size indicating the number of categories composing the agregate

  4. $nb_obs the number of observations on which information is deduced when the differentiation is computed (nb_obs must be stricly inferior to $threshold)

  5. $type_diff the type of differentiation between "internal" or "external".


res_diff <- find_pbm_diff(t_ex,threshold = 5,max_agregate_size = 15)

[Package diffman version 0.1.1 Index]