Function for measuring algorithmic performance

of high-level and low-level integer64 functions

### Description

`benchmark64`

compares high-level integer64 functions against the integer functions from Base R

`optimizer64`

compares for each high-level integer64 function the Base R integer function with several low-level integer64 functions with and without caching

### Usage

```
benchmark64(nsmall = 2^16, nbig = 2^25, timefun = repeat.time
)
optimizer64(nsmall = 2^16, nbig = 2^25, timefun = repeat.time
, what = c("match", "%in%", "duplicated", "unique", "unipos", "table", "rank", "quantile")
, uniorder = c("original", "values", "any")
, taborder = c("values", "counts")
, plot = TRUE
)
```

### Arguments

`nsmall` |
size of smaller vector |

`nbig` |
size of larger bigger vector |

`timefun` |
a function for timing such as |

`what` |
a vector of names of high-level functions |

`uniorder` |
one of the order parameters that are allowed in |

`taborder` |
one of the order parameters that are allowed in |

`plot` |
set to FALSE to suppress plotting |

### Details

`benchmark64`

compares the following scenarios for the following use cases:

scenario name | explanation |

32-bit | applying Base R function to 32-bit integer data |

64-bit | applying bit64 function to 64-bit integer data (with no cache) |

hashcache | dito when cache contains `hashmap` , see `hashcache` |

sortordercache | dito when cache contains sorting and ordering, see `sortordercache` |

ordercache | dito when cache contains ordering only, see `ordercache` |

allcache | dito when cache contains sorting, ordering and hashing |

use case name | explanation |

cache | filling the cache according to scenario |

match(s,b) | match small in big vector |

s %in% b | small %in% big vector |

match(b,s) | match big in small vector |

b %in% s | big %in% small vector |

match(b,b) | match big in (different) big vector |

b %in% b | big %in% (different) big vector |

duplicated(b) | duplicated of big vector |

unique(b) | unique of big vector |

table(b) | table of big vector |

sort(b) | sorting of big vector |

order(b) | ordering of big vector |

rank(b) | ranking of big vector |

quantile(b) | quantiles of big vector |

summary(b) | summary of of big vector |

SESSION | exemplary session involving multiple calls (including cache filling costs) |

Note that the timings for the cached variants do *not* contain the time costs of building the cache, except for the timing of the exemplary user session, where the cache costs are included in order to evaluate amortization.

### Value

`benchmark64`

returns a matrix with elapsed seconds, different high-level tasks in rows and different scenarios to solve the task in columns. The last row named 'SESSION' contains the elapsed seconds of the exemplary sesssion.

`optimizer64`

returns a dimensioned list with one row for each high-level function timed and two columns named after the values of the `nsmall`

and `nbig`

sample sizes. Each list cell contains a matrix with timings, low-level-methods in rows and three measurements `c("prep","both","use")`

in columns. If it can be measured separately, `prep`

contains the timing of preparatory work such as sorting and hashing, and `use`

contains the timing of using the prepared work. If the function timed does both, preparation and use, the timing is in `both`

.

### Author(s)

Jens OehlschlĂ¤gel <Jens.Oehlschlaegel@truecluster.com>

### See Also

### Examples

```
message("this small example using system.time does not give serious timings\n
this we do this only to run regression tests")
benchmark64(nsmall=2^7, nbig=2^13, timefun=function(expr)system.time(expr, gcFirst=FALSE))
optimizer64(nsmall=2^7, nbig=2^13, timefun=function(expr)system.time(expr, gcFirst=FALSE)
, plot=FALSE
)
## Not run:
message("for real measurement of sufficiently large datasets run this on your machine")
benchmark64()
optimizer64()
## End(Not run)
message("let's look at the performance results on Core i7 Lenovo T410 with 8 GB RAM")
data(benchmark64.data)
print(benchmark64.data)
matplot(log2(benchmark64.data[-1,1]/benchmark64.data[-1,])
, pch=c("3", "6", "h", "s", "o", "a")
, xlab="tasks [last=session]"
, ylab="log2(relative speed) [bigger is better]"
)
matplot(t(log2(benchmark64.data[-1,1]/benchmark64.data[-1,]))
, type="b", axes=FALSE
, lwd=c(rep(1, 14), 3)
, xlab="context"
, ylab="log2(relative speed) [bigger is better]"
)
axis(1
, labels=c("32-bit", "64-bit", "hash", "sortorder", "order", "hash+sortorder")
, at=1:6
)
axis(2)
data(optimizer64.data)
print(optimizer64.data)
oldpar <- par(no.readonly = TRUE)
par(mfrow=c(2,1))
par(cex=0.7)
for (i in 1:nrow(optimizer64.data)){
for (j in 1:2){
tim <- optimizer64.data[[i,j]]
barplot(t(tim))
if (rownames(optimizer64.data)[i]=="match")
title(paste("match", colnames(optimizer64.data)[j], "in", colnames(optimizer64.data)[3-j]))
else if (rownames(optimizer64.data)[i]=="%in%")
title(paste(colnames(optimizer64.data)[j], "%in%", colnames(optimizer64.data)[3-j]))
else
title(paste(rownames(optimizer64.data)[i], colnames(optimizer64.data)[j]))
}
}
par(mfrow=c(1,1))
```

