| standardize_address {healthyAddress} | R Documentation |
Standard address
Description
Standardize an address from a free text expression into its components as used in the PSMA (formerly, "Public Sector for Mapping Agencies") database.
Usage
standardize_address(
Address,
AddressLine2 = NULL,
return.type = c("data.table", "integer"),
integer_StreetType = FALSE,
hash_StreetName = FALSE,
check = 1L,
nThread = getOption("healthyAddress.nThread", 1L)
)
standard_address2(Address, nThread = getOption("healthyAddres.nThread", 1L))
standard_address3(Line1, Line2, Postcode = NULL, KeepStreetName = FALSE)
Arguments
Address |
A character vector, either a full address or (if |
AddressLine2 |
Either |
return.type |
Either |
integer_StreetType |
Should the street type be returned as an integer vector? |
hash_StreetName |
Should |
check |
An integer, whether the inputs should be checked for possibly invalid addresses or addresses that may not be parsed correctly. |
nThread |
Number of threads to use. |
Line1, Line2, Postcode |
For addresses split by line. |
KeepStreetName |
Should an additional character vector be included in the result of the street name? |
Details
By convention observed in the PSMA, street names such as 'THE ESPLANADE' have a street name of 'THE ESPLANADE' and an absent street type code.
Non-addresses passed have unspecified behaviour, though usually the numbers of the standard address will be 0 or NA. Postcodes may be negative in some circumstances where a postcode is not detected, though this should not be relied on.
For maximum performance, consider setting integer_StreetType and
hash_StreetName to TRUE. It has been observed that joining
two tables together has been faster when using the hash of the standardized
street name, rather than the street name, even when taking into account
the hashing process.
For performance reasons, addresses with more than 32 words are not supported.
If a postcode-like number exists at the end of a Address, but is not
in fact a postcode, then NA will be in each field, except postcode,
which will have the value -1.
Value
A data.table containing columns indicating the components of the standard address:
FLAT_NUMBERThe flat or unit number. This includes things like SHOP number.
NUMBER_FIRSTAs used in the PSMA, this identified the first (or only) number in the address range.
NUMBER_LASTAs used in the PSMA, if an address is marked as having a range of street numbers, the last of the range.
NUMBER_SUFFIXA
rawvector. The suffix observed after the numbers. The PSMA technically has multiple suffixes for each number component.H0If
hash_StreetName = TRUE, the DJB2 hash (as used inHashStreetNameof the street name.). Observed to have performance benefits.STREET_NAMEThe (uppercase) of the street name. Streets such as 'THE ESPLANADE' or 'THE AVENUE' are treated as entirely made up of a street name and have a
STREET_TYPE_CODEof zero.STREET_TYPE_CODEAn integer, the street type code marking the type of street such as ROAD, STREET, AVENUE, etc. They code corresponds approximately to the rank of their frequency in addresses.
STREET_TYPEIf
integer_StreetType = FALSE, then the (uppercase) standard name of the street type.POSTCODEAn integer vector, the postcode observed.