R: ACSS complexity

acss {acss}

R Documentation

ACSS complexity

Description

Functions to obtain the algorithmic complexity for short strings, an approximation of the Kolmogorov Complexity of a short string using the coding theorem method.

Usage

acss(string, alphabet = 9)

local_complexity(string, alphabet = 9, span = 5)

likelihood_d(string, alphabet = 9)

likelihood_ratio(string, alphabet = 9)

prob_random(string, alphabet = 9, prior= 0.5)

Arguments

`string`	`character` vector containing the to be analyzed strings (can contain multiple strings).
`alphabet`	`numeric`, the number of possible symbols (not necessarily actually appearing in str). Must be one of `c(2, 4, 5, 6, 9)` (can also be `NULL` or contain multiple values for `acss()`). Default is 9.
`prior`	`numeric`, the prior probability that the underlying process is random.
`span`	size of substrings to be created from `string`.

Details

The algorithmic complexity is computed using the coding theorem method: For a given alphabet size (number of different symbols in a string), all possible or a large number of random samples of Turing machines (TM) with a given number of states (e.g., 5) and number of symbols corresponding to the alphabet size were simulated until they reached a halting state or failed to end. The outputs of the TMs at the halting states produces a distribution of strings known as the algorithmic probability of the strings. The algorithmic coding theorem (Levin, 1974) establishes the connection between the complexity of a string K(s) and its algorithmic probability D(s) as:

K(s)\approx -\log_{2}(D(s))

This package accesses a database containing data on 4.5 million strings from length 1 to 12 simulated on TMs with 2, 4, 5, 6, and 9 symbols.

For a more detailed discussion see Gauvrit, Singmann, Soler-Toscano, and Zenil (2014), http://complexitycalculator.com/methodology.html, or references below.

Value

"acss": A matrix in which the rows correspond to the strings entered and the columns to the algorithmic complexity K and the algorithmic probability D of the string (see http://complexitycalculator.com/methodology.html).
"local_complexity": A list with elements corresponding to the strings. Each list containes a named vector of algorithmic complexities (K) of all substrings in each string with length span.
"likelihood_d": A named vector with the likelihoods for string given a detreministic process.
"likelihood_ratio": A named vector with the likelihood ratios (or Bayes factors) for string given a random rather than detreministic process.
"prob_random": A named vector with the posterior probabilities that for a random process given the strings and the provided prior for being produced by a random process (default is 0.5, which correspond to a prior of 1 - 0.5 = 0.5 for a detereministic process).

Note

The first time per session one of the functions described here is used, a relatively large dataset is loaded into memory which can take a considerable amount of time (> 10 seconds).

References

Delahaye, J.-P., & Zenil, H. (2012). Numerical evaluation of algorithmic complexity for short strings: A glance into the innermost structure of randomness. Applied Mathematics and Computation, 219(1), 63-77. doi:10.1016/j.amc.2011.10.006

Gauvrit, N., Singmann, H., Soler-Toscano, F., & Zenil, H. (2014). Algorithmic complexity for psychology: A user-friendly implementation of the coding theorem method. arXiv:1409.4080 [cs, stat]. http://arxiv.org/abs/1409.4080.

Gauvrit, N., Zenil, H., Delahaye, J.-P., & Soler-Toscano, F. (2014). Algorithmic complexity for short binary strings applied to psychology: a primer. Behavior Research Methods, 46(3), 732-744. doi:10.3758/s13428-013-0416-0

Levin, L. A. (1974). Laws of information conservation (nongrowth) and aspects of the foundation of probability theory. Problemy Peredachi Informatsii, 10(3), 30-35.

Soler-Toscano, F., Zenil, H., Delahaye, J.-P., & Gauvrit, N. (2012). Calculating Kolmogorov Complexity from the Output Frequency Distributions of Small Turing Machines. PLoS ONE, 9(5): e96223.

Examples


# WARNING: The first call to one of the functions
# discussed on this page loads a large data set 
# and usually takes > 10 seconds. Stay patient.

acss(c("HEHHEE", "GHHGGHGHH", "HSHSHHSHSS"))
##                 K.9          D.9
## HEHHEE     23.38852 9.106564e-08
## GHHGGHGHH  33.50168 8.222205e-11
## HSHSHHSHSS 35.15241 2.618613e-11

acss(c("HEHHEE", "GHHGGHGHH", "HSHSHHSHSS"))[,"K.9"]
## [1] 23.38852 33.50168 35.15241

acss(c("HEHHEE", "GHHGGHGHH", "HSHSHHSHSS"), alphabet = 2)
##                 K.2          D.2
## HEHHEE     14.96921 3.117581e-05
## GHHGGHGHH  25.60208 1.963387e-08
## HSHSHHSHSS 26.90906 7.935321e-09

acss(c("HEHHEE", "GHHGGHGHUE", "HSHSHHSHSS"), NULL)
##                 K.2      K.4      K.5      K.6      K.9
## HEHHEE     14.96921 18.55227 19.70361 20.75762 23.38852
## GHHGGHGHUE       NA 31.75832 33.00795 34.27457 37.78935
## HSHSHHSHSS 26.90906 29.37852 30.52566 31.76229 35.15241
##                     D.2          D.4          D.5          D.6
## HEHHEE     3.117581e-05 2.601421e-06 1.171176e-06 5.640722e-07
## GHHGGHGHUE           NA 2.752909e-10 1.157755e-10 4.812021e-11
## HSHSHHSHSS 7.935321e-09 1.432793e-09 6.469341e-10 2.745360e-10
##                     D.9
## HEHHEE     9.106564e-08
## GHHGGHGHUE 4.209915e-12
## HSHSHHSHSS 2.618613e-11

## Not run: 
likelihood_d(c("HTHTHTHT", "HTHHTHTT"), alphabet = 2)
##    HTHTHTHT    HTHHTHTT 
## 0.010366951 0.003102718 

likelihood_ratio(c("HTHTHTHT", "HTHHTHTT"), alphabet = 2)
##  HTHTHTHT  HTHHTHTT 
## 0.3767983 1.2589769 

prob_random(c("HTHTHTHT", "HTHHTHTT"), alphabet = 2)
##  HTHTHTHT  HTHHTHTT 
## 0.2736772 0.5573217

## End(Not run)

local_complexity(c("01011010111" ,"GHHGGHGHUE"), alphabet = 5, span=5)
## $`01011010111`
##    01011    10110    01101    11010    10101    01011    10111 
## 16.22469 16.24766 16.24766 16.22469 16.24322 16.22469 15.93927 
## 
## $GHHGGHGHUE
##    GHHGG    HHGGH    HGGHG    GGHGH    GHGHU    HGHUE 
## 16.44639 16.44639 16.24766 16.22469 16.58986 16.86449 


local_complexity(c("01011010111" ,"GHHGGHGHUE"), span=7)
## $`01011010111`
##  0101101  1011010  0110101  1101011  1010111 
## 26.52068 26.52068 26.47782 26.62371 26.29186 
## 
## $GHHGGHGHUE
##  GHHGGHG  HHGGHGH  HGGHGHU  GGHGHUE 
## 27.04623 26.86992 27.30871 27.84322

[Package acss version 0.2-5 Index]