standardize_address {healthyAddress} | R Documentation |
Standard address
Description
Standardize an address from a free text expression into its components as used in the PSMA (formerly, "Public Sector for Mapping Agencies") database.
Usage
standardize_address(
Address,
AddressLine2 = NULL,
return.type = c("data.table", "integer"),
integer_StreetType = FALSE,
hash_StreetName = FALSE,
check = 1L,
nThread = getOption("healthyAddress.nThread", 1L)
)
standard_address2(Address, nThread = getOption("healthyAddres.nThread", 1L))
standard_address3(Line1, Line2, Postcode = NULL, KeepStreetName = FALSE)
Arguments
Address |
A character vector, either a full address or (if |
AddressLine2 |
Either |
return.type |
Either |
integer_StreetType |
Should the street type be returned as an integer vector? |
hash_StreetName |
Should |
check |
An integer, whether the inputs should be checked for possibly invalid addresses or addresses that may not be parsed correctly. |
nThread |
Number of threads to use. |
Line1 , Line2 , Postcode |
For addresses split by line. |
KeepStreetName |
Should an additional character vector be included in the result of the street name? |
Details
By convention observed in the PSMA, street names such as 'THE ESPLANADE' have a street name of 'THE ESPLANADE' and an absent street type code.
Non-addresses passed have unspecified behaviour, though usually the numbers of the standard address will be 0 or NA. Postcodes may be negative in some circumstances where a postcode is not detected, though this should not be relied on.
For maximum performance, consider setting integer_StreetType
and
hash_StreetName
to TRUE
. It has been observed that joining
two tables together has been faster when using the hash of the standardized
street name, rather than the street name, even when taking into account
the hashing process.
For performance reasons, addresses with more than 32 words are not supported.
If a postcode-like number exists at the end of a Address
, but is not
in fact a postcode, then NA
will be in each field, except postcode,
which will have the value -1.
Value
A data.table
containing columns indicating the components of the standard address:
FLAT_NUMBER
The flat or unit number. This includes things like SHOP number.
NUMBER_FIRST
As used in the PSMA, this identified the first (or only) number in the address range.
NUMBER_LAST
As used in the PSMA, if an address is marked as having a range of street numbers, the last of the range.
NUMBER_SUFFIX
A
raw
vector. The suffix observed after the numbers. The PSMA technically has multiple suffixes for each number component.H0
If
hash_StreetName = TRUE
, the DJB2 hash (as used inHashStreetName
of the street name.). Observed to have performance benefits.STREET_NAME
The (uppercase) of the street name. Streets such as 'THE ESPLANADE' or 'THE AVENUE' are treated as entirely made up of a street name and have a
STREET_TYPE_CODE
of zero.STREET_TYPE_CODE
An integer, the street type code marking the type of street such as ROAD, STREET, AVENUE, etc. They code corresponds approximately to the rank of their frequency in addresses.
STREET_TYPE
If
integer_StreetType = FALSE
, then the (uppercase) standard name of the street type.POSTCODE
An integer vector, the postcode observed.