acss {acss}R Documentation

ACSS complexity

Description

Functions to obtain the algorithmic complexity for short strings, an approximation of the Kolmogorov Complexity of a short string using the coding theorem method.

Usage

acss(string, alphabet = 9)

local_complexity(string, alphabet = 9, span = 5)

likelihood_d(string, alphabet = 9)

likelihood_ratio(string, alphabet = 9)

prob_random(string, alphabet = 9, prior= 0.5)

Arguments

string

character vector containing the to be analyzed strings (can contain multiple strings).

alphabet

numeric, the number of possible symbols (not necessarily actually appearing in str). Must be one of c(2, 4, 5, 6, 9) (can also be NULL or contain multiple values for acss()). Default is 9.

prior

numeric, the prior probability that the underlying process is random.

span

size of substrings to be created from string.

Details

The algorithmic complexity is computed using the coding theorem method: For a given alphabet size (number of different symbols in a string), all possible or a large number of random samples of Turing machines (TM) with a given number of states (e.g., 5) and number of symbols corresponding to the alphabet size were simulated until they reached a halting state or failed to end. The outputs of the TMs at the halting states produces a distribution of strings known as the algorithmic probability of the strings. The algorithmic coding theorem (Levin, 1974) establishes the connection between the complexity of a string K(s) and its algorithmic probability D(s) as:

K(s)\approx -\log_{2}(D(s))

This package accesses a database containing data on 4.5 million strings from length 1 to 12 simulated on TMs with 2, 4, 5, 6, and 9 symbols.

For a more detailed discussion see Gauvrit, Singmann, Soler-Toscano, and Zenil (2014), http://complexitycalculator.com/methodology.html, or references below.

Value

"acss"

A matrix in which the rows correspond to the strings entered and the columns to the algorithmic complexity K and the algorithmic probability D of the string (see http://complexitycalculator.com/methodology.html).

"local_complexity"

A list with elements corresponding to the strings. Each list containes a named vector of algorithmic complexities (K) of all substrings in each string with length span.

"likelihood_d"

A named vector with the likelihoods for string given a detreministic process.

"likelihood_ratio"

A named vector with the likelihood ratios (or Bayes factors) for string given a random rather than detreministic process.

"prob_random"

A named vector with the posterior probabilities that for a random process given the strings and the provided prior for being produced by a random process (default is 0.5, which correspond to a prior of 1 - 0.5 = 0.5 for a detereministic process).

Note

The first time per session one of the functions described here is used, a relatively large dataset is loaded into memory which can take a considerable amount of time (> 10 seconds).

References

Delahaye, J.-P., & Zenil, H. (2012). Numerical evaluation of algorithmic complexity for short strings: A glance into the innermost structure of randomness. Applied Mathematics and Computation, 219(1), 63-77. doi:10.1016/j.amc.2011.10.006

Gauvrit, N., Singmann, H., Soler-Toscano, F., & Zenil, H. (2014). Algorithmic complexity for psychology: A user-friendly implementation of the coding theorem method. arXiv:1409.4080 [cs, stat]. http://arxiv.org/abs/1409.4080.

Gauvrit, N., Zenil, H., Delahaye, J.-P., & Soler-Toscano, F. (2014). Algorithmic complexity for short binary strings applied to psychology: a primer. Behavior Research Methods, 46(3), 732-744. doi:10.3758/s13428-013-0416-0

Levin, L. A. (1974). Laws of information conservation (nongrowth) and aspects of the foundation of probability theory. Problemy Peredachi Informatsii, 10(3), 30-35.

Soler-Toscano, F., Zenil, H., Delahaye, J.-P., & Gauvrit, N. (2012). Calculating Kolmogorov Complexity from the Output Frequency Distributions of Small Turing Machines. PLoS ONE, 9(5): e96223.

Examples


# WARNING: The first call to one of the functions
# discussed on this page loads a large data set 
# and usually takes > 10 seconds. Stay patient.

acss(c("HEHHEE", "GHHGGHGHH", "HSHSHHSHSS"))
##                 K.9          D.9
## HEHHEE     23.38852 9.106564e-08
## GHHGGHGHH  33.50168 8.222205e-11
## HSHSHHSHSS 35.15241 2.618613e-11

acss(c("HEHHEE", "GHHGGHGHH", "HSHSHHSHSS"))[,"K.9"]
## [1] 23.38852 33.50168 35.15241

acss(c("HEHHEE", "GHHGGHGHH", "HSHSHHSHSS"), alphabet = 2)
##                 K.2          D.2
## HEHHEE     14.96921 3.117581e-05
## GHHGGHGHH  25.60208 1.963387e-08
## HSHSHHSHSS 26.90906 7.935321e-09

acss(c("HEHHEE", "GHHGGHGHUE", "HSHSHHSHSS"), NULL)
##                 K.2      K.4      K.5      K.6      K.9
## HEHHEE     14.96921 18.55227 19.70361 20.75762 23.38852
## GHHGGHGHUE       NA 31.75832 33.00795 34.27457 37.78935
## HSHSHHSHSS 26.90906 29.37852 30.52566 31.76229 35.15241
##                     D.2          D.4          D.5          D.6
## HEHHEE     3.117581e-05 2.601421e-06 1.171176e-06 5.640722e-07
## GHHGGHGHUE           NA 2.752909e-10 1.157755e-10 4.812021e-11
## HSHSHHSHSS 7.935321e-09 1.432793e-09 6.469341e-10 2.745360e-10
##                     D.9
## HEHHEE     9.106564e-08
## GHHGGHGHUE 4.209915e-12
## HSHSHHSHSS 2.618613e-11

## Not run: 
likelihood_d(c("HTHTHTHT", "HTHHTHTT"), alphabet = 2)
##    HTHTHTHT    HTHHTHTT 
## 0.010366951 0.003102718 

likelihood_ratio(c("HTHTHTHT", "HTHHTHTT"), alphabet = 2)
##  HTHTHTHT  HTHHTHTT 
## 0.3767983 1.2589769 

prob_random(c("HTHTHTHT", "HTHHTHTT"), alphabet = 2)
##  HTHTHTHT  HTHHTHTT 
## 0.2736772 0.5573217

## End(Not run)

local_complexity(c("01011010111" ,"GHHGGHGHUE"), alphabet = 5, span=5)
## $`01011010111`
##    01011    10110    01101    11010    10101    01011    10111 
## 16.22469 16.24766 16.24766 16.22469 16.24322 16.22469 15.93927 
## 
## $GHHGGHGHUE
##    GHHGG    HHGGH    HGGHG    GGHGH    GHGHU    HGHUE 
## 16.44639 16.44639 16.24766 16.22469 16.58986 16.86449 


local_complexity(c("01011010111" ,"GHHGGHGHUE"), span=7)
## $`01011010111`
##  0101101  1011010  0110101  1101011  1010111 
## 26.52068 26.52068 26.47782 26.62371 26.29186 
## 
## $GHHGGHGHUE
##  GHHGGHG  HHGGHGH  HGGHGHU  GGHGHUE 
## 27.04623 26.86992 27.30871 27.84322 

[Package acss version 0.2-5 Index]