predefined_tests {diyar} | R Documentation |

## Predefined logical tests in `diyar`

`diyar`

### Description

A collection of predefined logical tests used with ** sub_criteria** objects

### Usage

```
exact_match(x, y)
range_match(x, y, range = 10)
prob_link(
x,
y,
cmp_func,
attr_threshold,
score_threshold,
probabilistic,
return_weights = FALSE
)
true(x, y)
false(x, y)
```

### Arguments

`x` |
Attribute(s) to be compared against. |

`y` |
Attribute(s) to be compared by. |

`range` |
Difference between |

`cmp_func` |
Logical tests such as string comparators. See |

`attr_threshold` |
Matching set of weight thresholds for each result of |

`score_threshold` |
Score threshold determining matched or linked records. See |

`probabilistic` |
If |

`return_weights` |
If |

### Details

** exact_match()** - test that

`x == y`

** range_match()** - test that

`x`

`\le`

`y`

`\le`

`(x + range)`

** prob_link()** - Test that a record-pair relate to the same entity based on Fellegi and Sunter (1969) model for deciding if two records belong to the same entity.

In summary, record-pairs are created and categorised as matches and non-matches (`attr_threshold`

) with user-defined functions (`cmp_func`

).
If `probabilistic`

is `TRUE`

, two probabilities (`m`

and `u`

) are used to calculate weights for matches and non-matches.
The `m`

-probability is the probability that matched records are actually from the same entity i.e. a true match,
while `u`

-probability is the probability that matched records are not from the same entity i.e. a false match.
Record-pairs whose total score are above a certain threshold (`score_threshold`

) are assumed to belong to the same entity.

Agreement (match) and disagreement (non-match) scores are calculated as described by Asher et al. (2020).

For each record pair, an agreement for attribute `i`

is calculated as;

`\log_{2}(m_{i}/u_{i})`

For each record pair, a disagreement score for attribute `i`

is calculated as;

`\log_{2}((1-m_{i})/(1-u_{i}))`

where `m_{i}`

and `u_{i}`

are the `m`

and `u`

-probabilities for each value of attribute `i`

.

Note that each probability is calculated as a combined probability for the record pair.
For example, if the values of the record-pair have `u`

-probabilities of `0.1`

and `0.2`

respectively,
then the `u`

-probability for the pair will be `0.02`

.

Missing data (`NA`

) are considered non-matches and assigned a `u`

-probability of `0`

.

### Examples

```
`exact_match`
exact_match(x = 1, y = 1)
exact_match(x = 1, y = 2)
`range_match`
range_match(x = 10, y = 16, range = 6)
range_match(x = 16, y = 10, range = 6)
```

*diyar*version 0.5.1 Index]