gen {genscore}R Documentation

Random data generator from general a-b distributions with general domain types, assuming a and b are rational numbers.

Description

Random data generator from general a-b graphs with general domain types using adaptive rejection metropolis sampling (ARMS). x^(0/0) treated as log(x) and x^(n/0) as exp(x) for n non-zero. Density only guaranteed to be a proper density when 2*a > b >= 0 or when a = b = 0.

Usage

gen(
  n,
  setting,
  abs,
  eta,
  K,
  domain,
  finite_infinity = NULL,
  xinit = NULL,
  seed = NULL,
  burn_in = 1000,
  thinning = 100,
  verbose = TRUE,
  remove_outofbound = TRUE
)

Arguments

n

An integer, number of observations.

setting

A string that indicates the distribution type, must be one of "exp", "gamma", "gaussian", "log_log", "log_log_sum0", or of the form "ab_NUM1_NUM2", where NUM1 is the a value and NUM2 is the b value, and NUM1 and NUM2 must be integers or two integers separated by "/", e.g. "ab_2_2", "ab_2_5/4" or "ab_2/3_1/2".

abs

A boolean. If TRUE, density is rewritten as f(|x|), i.e. with |x|^(a_numer/a_denom) and |x|^(b_numer/b_denom)

eta

A vector, the linear part in the distribution.

K

A square matrix, the interaction matrix. There should exist some C > 0 such that

{\boldsymbol{x}^a}^{\top}\mathbf{K}{\boldsymbol{x}}^a/({\boldsymbol{x}^a}^{\top}{\boldsymbol{x}}^a) >= C

for all x in the domain (i.e. K is positive definite if domain$type == "R" and K is co-positive if domain$type == "R+".). If a_numer == a_denom == b_numer == b_denom == 0 && domain$type == "simplex", K can also have all row and column sums equal to 0 but have all but one eigenvalues (0) positive.

domain

A list returned from make_domain() that represents the domain.

finite_infinity

A finite positive number. Inf in actual generation will be truncated to finite_infinity if applicable. Although the code will adaptively increase finite_infinity, the user should set it to a large number initially so that abs(x) > finite_infinity with very small probability.

xinit

Optional. A p-vector, an initial point in the domain. If the domain is defined by more than one ineq or by one ineq containing negative coefficients, this must be provided. In the unlikely case where the function fails to automatically generate an initial point this should also be provided.

seed

Optional. A number, the seed for the random generator.

burn_in

Optional. A positive integer, the number of burn-in samples in ARMS to be discarded, meaning that samples from the first burn_in x thinning iterations will be discarded.

thinning

Optional. A positive integer, thinning factor in ARMS. Samples are taken at iteration steps (\mathrm{burn\_in}+1)\times\mathrm{thinning}, ..., (\mathrm{burn\_in}+n)\times\mathrm{thinning}. Default to 100.

verbose

Optional. A boolean. If TRUE, prints a progress bar showing the progress. Defaults to TRUE.

remove_outofbound

Optional. A logical, defaults to TRUE. If TRUE, a test whether each sample lies inside the domain will be done, which may take a while for larger sample sizes, and rows that do not lie in the domain will be removed (may happen for domain$type == "polynomial" with more than 1 ineq and an OR ("|") in domain$rule.).

Details

NOTE: For polynomial domains with many ineqs and a rule containing "OR" ("|"), not all samples generated are guaranteed to be inside the domain. It is thus recommended to set remove_outofbound to TRUE and rerun the function with new initial points until the desired number of in-bound samples have been generated.

Randomly generates n samples from the p-variate a-b distributions with parameters \boldsymbol{\eta} and \mathbf{K}, where p is the length of \boldsymbol{\eta} or the dimension of the square matrix \mathbf{K}.

Letting a=a_numer/a_denom and b=b_numer/b_denom, the a-b distribution is proportional to

\exp\left(-\frac{1}{2a}{\boldsymbol{x}^a}^{\top}\mathbf{K}{\boldsymbol{x}}^a+\boldsymbol{\eta}^{\top}\frac{\boldsymbol{x}^b-\mathbf{1}_p}{b}\right)

. Note that x^(0/0) is understood as log(x), and x^(n/0) with nonzero n is exp(n*x), and in both cases the a and b in the denominators in the density are treated as 1.

Value

An n\times p matrix of samples, where p is the length of eta.

Examples

n <- 20
p <- 10
eta <- rep(0, p)
K <- diag(p)
dm <- 1 + (1-1/(1+4*exp(1)*max(6*log(p)/n, sqrt(6*log(p)/n))))

# Gaussian on sum(x^2) > 10 && sum(x^(1/3)) > 10 with x allowed to be negative
domain <- make_domain("polynomial", p=p, rule="1 && 2",
       ineqs=list(list("expression"="sum(x^2)>10", abs=FALSE, nonnegative=FALSE),
                      list("expression"="sum(x^(1/3))>10", abs=FALSE, nonnegative=FALSE)))
xinit <- rep(sqrt(20/p), p)
x <- gen(n, setting="gaussian", abs=FALSE, eta=eta,  K=K, domain=domain, finite_infinity=100, 
       xinit=xinit, seed=2, burn_in=500, thinning=100, verbose=FALSE)

# exp on ([0, 1] v [2,3])^p
domain <- make_domain("uniform", p=p, lefts=c(0,2), rights=c(1,3))
x <- gen(n, setting="exp", abs=FALSE, eta=eta, K=K, domain=domain, xinit=NULL, 
       seed=2, burn_in=500, thinning=100, verbose=TRUE)

# gamma on {x1 > 1 && log(1.3) < x2 < 1 && x3 > log(1.3) && ... && xp > log(1.3)}
domain <- make_domain("polynomial", p=p, rule="1 && 2 && 3",
       ineqs=list(list("expression"="x1>1", abs=FALSE, nonnegative=TRUE),
                      list("expression"="x2<1", abs=FALSE, nonnegative=TRUE),
                      list("expression"="exp(x)>1.3", abs=FALSE, nonnegative=FALSE)))
set.seed(1)
xinit <- c(1.5, 0.5, abs(stats::rnorm(p-2))+log(1.3))
x <- gen(n, setting="gamma", abs=FALSE, eta=eta, K=K, domain=domain, finite_infinity=100, 
       xinit=xinit, seed=2, burn_in=500, thinning=100, verbose=FALSE)

# a0.6_b0.7 on {x in R_+^p: sum(log(x))<2 || (x1^(2/3)-1.3x2^(-3)<1 && exp(x1)+2.3*x2>2)}
domain <- make_domain("polynomial", p=p, rule="1 || (2 && 3)",
       ineqs=list(list("expression"="sum(log(x))<2", abs=FALSE, nonnegative=TRUE),
                      list("expression"="x1^(2/3)-1.3x2^(-3)<1", abs=FALSE, nonnegative=TRUE),
                      list("expression"="exp(x1)+2.3*x2^2>2", abs=FALSE, nonnegative=TRUE)))
xinit <- rep(1, p)
x <- gen(n, setting="ab_3/5_7/10", abs=FALSE, eta=eta, K=K, domain=domain, finite_infinity=1e4, 
       xinit=xinit, seed=2, burn_in=500, thinning=100, verbose=FALSE)

# log_log model exp(-log(x) %*% K %*% log(x)/2 + eta %*% log(x)) on {x in R_+^p: sum_j j * xj <= 1}
domain <- make_domain("polynomial", p=p,
       ineqs=list(list("expression"=paste(paste(sapply(1:p,
                           function(j){paste(j, "x", j, sep="")}), collapse="+"), "<1"),
                     abs=FALSE, nonnegative=TRUE)))
x <- gen(n, setting="log_log", abs=FALSE, eta=eta, K=K, domain=domain, finite_infinity=100, 
       xinit=NULL, seed=2, burn_in=500, thinning=100, verbose=FALSE)

# log_log model on the simplex with K having row and column sums 0 (Aitchison model)
domain <- make_domain("simplex", p=p)
K <- -cov_cons("band", p=p, spars=3, eig=1)
diag(K) <- diag(K) - rowSums(K) # So that rowSums(K) == colSums(K) == 0
eigen(K)$val[(p-1):p] # Make sure K has one 0 and p-1 positive eigenvalues
x <- gen(n, setting="log_log_sum0", abs=FALSE, eta=eta, K=K, domain=domain, xinit=NULL, 
       seed=2, burn_in=500, thinning=100, verbose=FALSE)

# Gumbel_Gumbel model exp(-exp(2x) %*% K %*% exp(2x)/2 + eta %*% exp(-3x)) on {sum(|x|) < 1}
domain <- make_domain("polynomial", p=p,
       ineqs=list(list("expression"="sum(x)<1", abs=TRUE, nonnegative=FALSE)))
K <- diag(p)
x <- gen(n, setting="ab_2/0_-3/0", abs=FALSE, eta=eta, K=K, domain=domain, finite_infinity=100, 
       xinit=NULL, seed=2, burn_in=500, thinning=100, verbose=FALSE)

[Package genscore version 1.0.2.2 Index]