R: Fitting Generalized Linear Models using GPUmatrix objects

GPUglm {GPUmatrix}

R Documentation

Fitting Generalized Linear Models using GPUmatrix objects

Description

These functions mimic the functions speedglm and speedglm.wfit of the library 'speedglm' to compute on gpu.matrix-class objects. At the same time, these functions mimic the functions glm, and glm.fit from the library 'stats' to compute on large data sets.

Usage

glm.fit.GPU(x, y, intercept = TRUE, weights = NULL, family =
                   gaussian(), start = NULL, etastart = NULL, mustart =
                   NULL, offset = NULL, acc = 1e-08, maxit = 25, k = 2,
                   sparse = NULL, trace = FALSE, dtype = "float64", device =
                   NULL, type = NULL, ...)

GPUglm(...)

Arguments

As mentioned in the description, these functions mimic speedglm, so almost every parameter does too. There is only three new parameters explained below.

The common parameters with speedglm:

`x`	the same as `speedglm`: the design matrix of dimension `n*p` where `n` is the number of observations and `p` is the number of features. `x` can be either a 'matrix', 'Matrix' or 'gpu.matrix-class' object.
`y`	the same as `speedglm`: a vector of `n` observations. `y` can be either a 'matrix', 'Matrix' or 'gpu.matrix-class' object.
`intercept`	the same as `speedglm`: Logical. If first column of `x` should be consider as 'intercept' (default) or not. Notice that seting this parameter TRUE or FALSE will not change the design matrix used to fit the model.
`weights`	the same as `speedglm`: an optional vector of ‘prior weights’ to be used in the fitting process. Should be NULL (default) or a numeric vector.
`family`	the same as `speedglm`: a description of the error distribution and link function to be used in the model. For `glm.fit.GPU` this can be a character string naming a family function, a family function or the result of a call to a family function. (See `family` for details of family functions.)
`start`	the same as `speedglm`: starting values for the parameters in the linear prediction.
`etastart`	the same as `speedglm`: starting values for the linear predictor.
`mustart`	the same as `speedglm`: starting values for the vector of means.
`offset`	the same as `speedglm`: this can be used to specify an a priori known component to be included in the linear predictor during fitting. This should be NULL or a numeric vector of length equal to the number of cases. One or more `offset` terms can be included in the formula instead or as well, and if more than one is specified their sum is used. See `model.offset`.
`acc`	the same as `speedglm`: tolerance to be used for the estimation (by default equal to: 1e-08).
`maxit`	the same as `speedglm`: maximum number of iterations.
`k`	the same as `speedglm`: numeric, the penalty per parameter to be used; the default k = 2 is the classical AIC.
`sparse`	if matrix `x` is desired to be treated as sparse. Not yet implemented.
`trace`	If the user wants to see the development of the iterations. By default FALSE
`...`	For `GPUglm`: arguments to be used to form the default control argument if it is not supplied directly.

The glm.fit.GPU function internally initialises matrices of the 'GPUmatrix' class by calling the gpu.matrix function. The following parameters correspond to this function:

`dtype`	parameter of the function `gpu.matrix`: "data type. User can indicate "float64", "float32" or "int" for "int64"." By default it is set to 'float64'.
`device`	parameter of the function `gpu.matrix`:"It indicates the device to load cuda. If not indicated, 'device' will be set to 'cuda' if it is available."
`type`	parameter of the function `gpu.matrix`: "If gpu.matrix is 'torch' (by default if type is NULL) or "tensorflow"."

Details

The GPUglm function internally calls the glm function by selecting glm.fit.GPU as the method. The input parameters of the GPUglm function are equivalent to those of the glm function.

If the gpu.matrix-class object(s) are stored on the GPU, then the operations will be performed on the GPU. See gpu.matrix.

Value

Both glmGPU, and glm.fit.GPU returns an object of class "GPUglm". This object can be treated as a list. This object mimics the output of the function speedglm:

`coefficients`	the estimated coefficients.
`logLik`	the log likelihood of the fitted model.
`iter`	the number of iterations of IWLS used.
`tol`	the maximal value of tolerance reached.
`family`	the maximal value of tolerance reached.
`link`	the link function used.
`df`	the degrees of freedom of the model.
`XTX`	the product X'X (weighted, if the case).
`dispersion`	the estimated dispersion parameter of the model.
`ok`	the set of column indeces of the model matrix where the model has been fitted.
`rank`	the rank of the model matrix.
`RSS`	the estimated residual sum of squares of the fitted model.
`method`	TODO
`aic`	the estimated Akaike Information Criterion.
`offset`	he model offset.
`sparse`	a logical value which indicates if the model matrix is sparse.
`deviance`	the estimated deviance of the fitted model.
`nulldf`	the degrees of freedom of the null model.
`nulldev`	the estimated deviance of the null model.
`ngoodobs`	the number of non-zero weighted observations.
`n`	the number of observations.
`intercept`	a logical value which indicates if an intercept has been used.
`convergence`	a logical value which indicates if convergence was reached.
`terms`	the terms object used.
`call`	the matched call.
`xlevels`	(where relevant) a record of the levels of the factors used in fitting.

Examples



## Not run: 
require(MASS,quietly = TRUE)
require(stats,quietly = TRUE)

# linear model (example taken from 'glm'):

utils::data(anorexia, package = "MASS")
anorex_glm <- glm(Postwt ~ Prewt + Treat + offset(Prewt),
                  family = gaussian(), data = anorexia)
summary(anorex_glm)

#Using GPUglm:
anorex_GPUglm <- GPUglm(Postwt ~ Prewt + Treat + offset(Prewt),
                        family = gaussian, data = anorexia)
summary(anorex_GPUglm)

#linear model using glm.fit.gpu
x <- model.matrix(~Treat+Prewt,data=anorexia)
y <- as.matrix(anorexia$Postwt)
s1_glm <- glm.fit(x=x,y=y)
s1_gpu <- glm.fit.GPU(x=x,y=y)

s1_glm$coefficients
s1_gpu$coefficients


# poisson (example taken from 'glm'):
counts <- c(18,17,15,20,10,20,25,13,12)
outcome <- gl(3,1,9)
treatment <- gl(3,3)
glm.D93 <- glm(counts ~ outcome + treatment, family = poisson())
summary(glm.D93)

gpu.glm.D93 <- GPUglm(counts ~ outcome + treatment, family = poisson())
summary(gpu.glm.D93)

#logistic:
data(menarche)
glm.out <- glm(cbind(Menarche, Total-Menarche) ~ Age, family=binomial(), data=menarche)
summary(glm.out)

glm.out_gpu <- GPUglm(cbind(Menarche, Total-Menarche) ~ Age, family=binomial(), data=menarche)
summary(glm.out_gpu)

#can be also called using glm.fit.gpu:
new_menarche <- data.frame(Age=rep(menarche$Age,menarche$Total))
observations <- c()
for(i in 1:nrow(menarche)){
  observations <- c(observations,rep(c(0,1),c(menarche$Total[i]-menarche$Menarche[i],
                                              menarche$Menarche[i])))
}
new_menarche$observations <- observations
x <- model.matrix(~Age,data=new_menarche)
head(new_menarche)
glm.fit_gpu <- glm.fit.GPU(x=x,y=new_menarche$observations, family=binomial())
summary(glm.fit_gpu)

#GPUmatrix package also include the function 'LR_GradientConjugate_gpumatrix'
lr_gran_sol <- LR_GradientConjugate_gpumatrix(X = x,y = observations)

#check results
glm.out$coefficients
glm.out_gpu$coefficients
glm.fit_gpu$coefficients
lr_gran_sol

## End(Not run)

[Package GPUmatrix version 1.0.2 Index]