DBrank {DBNMFrank}R Documentation

Rank Selection for Non-Negative Matrix Factorization

Description

The package estimates the rank parameter for Non-negative Matrix Factorization given the non-negative data and its disitribution. The method is based on hypothesis testing, using a deconvolved bootstrap distribution to assess the significance level accurately despite the large amount of optimization error. The distribution of the non-negative data can be either Normal distributed or Poisson distributed.

Usage

DBrank(data,k,alpha,distn,sz,inisz)

Arguments

data

Matrix. The non-negative data. Its rows are different observations and columns are variables.

k

Optional. The value where the hypothesis test start.

alpha

Optional. The significance level. Default is 0.1.

distn

Character. The distribution of the non-negative data. It should be either "Normal" or "Poisson".

sz

Optional. The bootstrap size.

inisz

Optional. The number of initial values used to obtain the true maximum likelihood for NMF.

Details

Our rank selection for NMF is based on sequentially performing the following hypothesis test:

$H_0$: the rank of the feature matrix is $k$.

$H_a$: the rank of the feature matrix is at least $k+1$.

After applying the goodness-of-fit test, if $H_0$ is rejected by significance level 'alpha', let $k=k+1$ and repeat the test until the pvalue is greater than 'alpha'. For our hypothesis test, the test statistic is the likelihood rato. 'inisz' different initial values are used to get the maximum likelihood for rank 'k' NMF and rank 'k+1' NMF. We use a deconvolved parametric bootstrap to obtain the null distribution of the test statistic. The bootstrap size is 'sz'.

Value

rank

The NMF rank selected by the function.

pvalue

The pvalue for the estimated rank.

Examples


library(NMF)
set.seed(45217)
########generate a rank 2 Poisson NMF data
x=syntheticNMF(50,2,30)
est.rank=DBrank(t(x),k=2,sz=50,inisz=6)


[Package DBNMFrank version 0.1.0 Index]