IP-package {IP}R Documentation

Classes and methods for IP addresses

Description

Classes and methods for IP addresses

Details

The IP package provides vector-like classes and methods for Internet Protocol (IP) addresses. It is based on the ip4r PostgreSQL extension available at https://github.com/RhodiumToad/ip4r.

An IP address is a numerical label assigned to each device connected to a computer network that uses the Internet Protocol for communication. The Internet Protocol uses those labels to identify nodes such as host or network interface for relaying datagrams between them across network boundaries.

Internet Protocol version 4 (IPv4) defines an IP address as an unsigned 32-bit number. However, because of the growth of the Internet and the depletion of available IPv4 addresses, a new version of IP (IPv6) using 128 bits for the IP address was developed from 1995 on. IPv6 deployment has been ongoing since the mid-2000s. Note that there is no IPv5 address. In addition, IPv4 and IPv6 protocols differ in many respects besides IP addresses representation.

IP addresses are usually written and displayed in human-readable notations, such as "192.168.0.1" in IPv4, and "fe80::3b13:cff7:1013:d2e7" in IPv6. Ranges can be represented using two IP addresses separated by a dash or using the Classless Inter-Domain Routing (CIDR) notation. CIDR suffixes an address with he size of the routing prefix of the address which is the number of significant bits. For instance, "192.168.0.0/16" is a private network with subnet mask "255.255.0.0" and is equivalent to "192.168.0.0-192.168.255.255". Currently, the IP package supports the following object types implemented using S4 classes :

The IP package also provides methods for arithmetic, comparison and bitwise unary and binary operations in addition to sorting and lookup and querying information about IP addresses and domain names. All operators are not available for all classes mostly by design but a few are still missing because they have not been implemented yet. IP objects can also be subseted or stored in a data.frame and serialized.

The IP and IPr classes are only convenience containers for instances when addresses must be created from vectors mixing both protocols. The IPv4 and IPv6 protocols and their corresponding IP representation are indeed very different in many respects so only a subset of methods are available for them. In addition, methods for those containers tend to run slower because, at the moment, they need to make two passes (one for IPv4* and one for IPv6* objects). Use the ipv4(IP) (resp. ipv4r(IPr)) and ipv6(IP) (resp. ipv6r(IP)) getters to work with v4 and v6 objects separately.

Design considerations

IP objects were designed to behave as much as possible like base R atomic vectors. Therefore many R base functions such as table() or factor() or merging two data.frame using IP objects as keys work.

But there are a few caveats when using functions or methods not provided by the IP package in which case you may have to convert to the character representation of the addresses.

IP objects are S4 objects that all inherit from the integer class and because of this there are instances where function calls will operate on the inherited integer .Data part of the object only. As of writing, this is for example the case for the nchar function which returns the number of characters of the .Data vector only. But grep works because the x argument to the function is explicitly coerced to character before further processing.

The .Data slot does not hold the addresses but an index to the addresses. When calling a non-IP method, R will first look for a method for this particular object. If none is found, it will try to find one for the class this object inherits from. Hence, the call will operate on the index, and not on the object as a hole. This is why some operations are explicitly blacklisted such as multiplication. Since there are no `*` for IP objects, multiplying an IP with a number would otherwise fall back to multiplying the index by this number, thus badly damaging the object.

Reasons for using an index are twofold. First, each IP address space use the entire 32 (resp. 128) bits integer range. Thus, no value can be used for NA. For instance, R defines NA_integer_ as 2^{31} which a perfectly valid IP v4 address ("128.0.0.0"). Second reason is IP words size. An IPv4 address uses 32 bits and thus can be stored using an integer vector (and IPv4 address ranges uses 64 bits and could be stored using a numeric vector). But an IP v6 address uses 128 bits and an IP v6 address range uses 256 bits and currently no R built-in atomic vectors are wide enough to hold them. IP addresses other than IPv4 have to be stored in a separate matrix and the index is used to retrieve their value.

Therefore, each IP* object has an index which either points to the IP location in a table or mark the value as NA. This way R believes it is dealing with a regular vector but at the cost of increased memory consumption. The memory footprint is a function of the number of NA.

On the other end, this design makes it easy to know if there are any NA and, if none, skip NA checking which makes things faster.

SIMD support

The IP package provide an experimental support for AVX2 vectorized operations for IP comparison and arithmetic. To enable AVX2 support, please pass the "--enable-avx2" configure.args argument to the install.packages() function.

Data protection

One last caveat. In certain countries such as EU member countries, IP addresses are considered personal data (see Article 29 Working Party Opinion 4/2007 and ECJ ruling dated 19 October 2016 –ref.: C582/14). IP processing must therefore be done in accordance to the applicable laws and regulations.


[Package IP version 0.1.3 Index]