standardize_address {healthyAddress}R Documentation

Standard address

Description

Standardize an address from a free text expression into its components as used in the PSMA (formerly, "Public Sector for Mapping Agencies") database.

Usage

standardize_address(
  Address,
  AddressLine2 = NULL,
  return.type = c("data.table", "integer"),
  integer_StreetType = FALSE,
  hash_StreetName = FALSE,
  check = 1L,
  nThread = getOption("healthyAddress.nThread", 1L)
)

standard_address2(Address, nThread = getOption("healthyAddres.nThread", 1L))

standard_address3(Line1, Line2, Postcode = NULL, KeepStreetName = FALSE)

Arguments

Address

A character vector, either a full address or (if AddressLine2 is not NULL) the first line of an Australian address.

AddressLine2

Either NULL (the default) or a character vector, the same length as Address giving the second line of the Address.

return.type

Either "data.table" or "integer". "data.table" implies a table of columns separating the address components. "integer" means an integer vector creating a bijection between the address and the PSMA internal id.

integer_StreetType

Should the street type be returned as an integer vector?

hash_StreetName

Should STREET_NAME be returned as an integer hash, as in HashStreetName?

check

An integer, whether the inputs should be checked for possibly invalid addresses or addresses that may not be parsed correctly.

nThread

Number of threads to use.

Line1, Line2, Postcode

For addresses split by line. Line1 is assumed to end with the street type. The second line is only used to determine Postcode, and then only if it is NULL, the default.

KeepStreetName

Should an additional character vector be included in the result of the street name?

Details

By convention observed in the PSMA, street names such as 'THE ESPLANADE' have a street name of 'THE ESPLANADE' and an absent street type code.

Non-addresses passed have unspecified behaviour, though usually the numbers of the standard address will be 0 or NA. Postcodes may be negative in some circumstances where a postcode is not detected, though this should not be relied on.

For maximum performance, consider setting integer_StreetType and hash_StreetName to TRUE. It has been observed that joining two tables together has been faster when using the hash of the standardized street name, rather than the street name, even when taking into account the hashing process.

For performance reasons, addresses with more than 32 words are not supported.

If a postcode-like number exists at the end of a Address, but is not in fact a postcode, then NA will be in each field, except postcode, which will have the value -1.

Value

A data.table containing columns indicating the components of the standard address:

FLAT_NUMBER

The flat or unit number. This includes things like SHOP number.

NUMBER_FIRST

As used in the PSMA, this identified the first (or only) number in the address range.

NUMBER_LAST

As used in the PSMA, if an address is marked as having a range of street numbers, the last of the range.

NUMBER_SUFFIX

A raw vector. The suffix observed after the numbers. The PSMA technically has multiple suffixes for each number component.

H0

If hash_StreetName = TRUE, the DJB2 hash (as used in HashStreetName of the street name.). Observed to have performance benefits.

STREET_NAME

The (uppercase) of the street name. Streets such as 'THE ESPLANADE' or 'THE AVENUE' are treated as entirely made up of a street name and have a STREET_TYPE_CODE of zero.

STREET_TYPE_CODE

An integer, the street type code marking the type of street such as ROAD, STREET, AVENUE, etc. They code corresponds approximately to the rank of their frequency in addresses.

STREET_TYPE

If integer_StreetType = FALSE, then the (uppercase) standard name of the street type.

POSTCODE

An integer vector, the postcode observed.


[Package healthyAddress version 0.4.3 Index]