rgeode {RGeode} | R Documentation |
GEOmetric Density Estimation.
Description
It selects the principal directions of the data and performs inference. Moreover GEODE is also able to handle missing data.
Usage
rgeode(Y, d = 6, burn = 1000, its = 2000, tol = 0.01, atau = 1/20,
asigma = 1/2, bsigma = 1/2, starttime = NULL, stoptime = NULL,
fast = TRUE, c0 = -1, c1 = -0.005)
Arguments
Y |
array_like |
d |
int, optional |
burn |
int, optional |
its |
int, optional |
tol |
double, optional |
atau |
double, optional |
asigma |
double, optional |
bsigma |
double, optional |
starttime |
int, optional |
stoptime |
int, optional |
fast |
bool, optional |
c0 |
double, optional |
c1 |
double, optional |
Details
GEOmetric Density Estimation (rgeode) is a fast algorithm performing inference on normally distributed data. It is essentially divided in two principal steps:
Selection of the principal axes of the data.
Adaptive Gibbs sampler with the creation of a set of samples from the full conditional posteriors of the parameters of interest, which enable us to perform inference.
It takes in inputs several quantities. A rectangular
(N,D)
matrix Y
, on which we will run a Fast rank
d
SVD. The conservative upper bound of the true dimension
of our data d
. A set of tuning parameters. We remark that
the choice of the conservative upper bound d
must be such
that d>p
, with p
real dimension, and d << D
.
Value
rgeode
returns a list containing the following
components:
InD |
array_like |
u |
matrix |
tau |
matrix |
sigmaS |
array_like |
W |
matrix |
Miss |
list
|
Note
The part related to the missing data is filled only in the case in which we have missing data.
Author(s)
L. Rimella, lorenzo.rimella@hotmail.it
References
[1] Y. Wang, A. Canale, D. Dunson. "Scalable Geometric Density Estimation" (2016).
Examples
library(MASS)
library(RGeode)
####################################################################
# WITHOUT MISSING DATA
####################################################################
# Define the dataset
D= 200
n= 500
d= 10
d_true= 3
set.seed(321)
mu_true= runif(d_true, -3, 10)
Sigma_true= matrix(0,d_true,d_true)
diag(Sigma_true)= c(runif(d_true, 10, 100))
W_true = svd(matrix(rnorm(D*d_true, 0, 1), d_true, D))$v
sigma_true = abs(runif(1,0,1))
mu= W_true%*%mu_true
C= W_true %*% Sigma_true %*% t(W_true)+ sigma_true* diag(D)
y= mvrnorm(n, mu, C)
################################
# GEODE: Without missing data
################################
start.time <- Sys.time()
GEODE= rgeode(Y= y, d)
Sys.time()- start.time
# SIGMAS
#plot(seq(110,3000,by=1),GEODE$sigmaS[110:3000],ty='l',col=2,
# xlab= 'Iteration', ylab= 'sigma^2', main= 'Simulation of sigma^2')
#abline(v=800,lwd= 2, col= 'blue')
#legend('bottomright',c('Posterior of sigma^2', 'Stopping time'),
# lwd=c(1,2),col=c(2,4),cex=0.55, border='black', box.lwd=3)
####################################################################
# WITH MISSING DATA
####################################################################
###########################
#Insert NaN
n_m = 5 #number of data vectors containing missing features
d_m = 1 #number of missing features
data_miss= sample(seq(1,n),n_m)
features= sample(seq(1,D), d_m)
for(i in 2:n_m)
{
features= rbind(features, sample(seq(1,D), d_m))
}
for(i in 1:length(data_miss))
{
if(i==length(data_miss))
{
y[data_miss[i],features[i,][-1]]= NaN
}
else
{
y[data_miss[i],features[i,]]= NaN
}
}
################################
# GEODE: With missing data
################################
set.seed(321)
start.time <- Sys.time()
GEODE= rgeode(Y= y, d)
Sys.time()- start.time
# SIGMAS
#plot(seq(110,3000,by=1),GEODE$sigmaS[110:3000],ty='l',col=2,
# xlab= 'Iteration', ylab= 'sigma^2', main= 'Simulation of sigma^2')
#abline(v=800,lwd= 2, col= 'blue')
#legend('bottomright',c('Posterior of sigma^2', 'Stopping time'),
# lwd=c(1,2),col=c(2,4),cex=0.55, border='black', box.lwd=3)
####################################################################
####################################################################