R: Fast generalized linear model in a database

dbglm {dbglm}

R Documentation

Fast generalized linear model in a database

Description

Fast generalized linear model in a database

Usage

dbglm(formula, family = binomial(), tbl, sd = FALSE,
weights = .NotYetImplemented(), subset = .NotYetImplemented(), ...)

Arguments

`...`	This argument is required for S3 method extension.
`formula`	A model formula. It can have interactions but cannot have any transformations except `factor`
`family`	Model family
`tbl`	An object inheriting from `tbl`. Will typically be a database-backed lazy `tbl` from the `dbplyr` package.
`sd`	Experimental: compute the standard deviation of the score as well as the mean in the update and use it to improve the information matrix estimate
`weights`	We don't support weights
`subset`	If you want to analyze a subset, use `filter()` on the data

Details

For a dataset of size N the subsample is of size N^(5/9). Unless N is large the approximation won't be very good. Also, with small N it's quite likely that, eg, some factor levels will be missing in the subsample.

Value

A list with elements

`tildebeta`	coefficients from subsample
`hatbeta`	final estimate
`tildeV`	variance matrix from subsample
`hatV`	final estimate

References

http://notstatschat.tumblr.com/post/171570186286/faster-generalised-linear-models-in-largeish-data

[Package dbglm version 1.0.0 Index]