R: Create random values that deliver linear regressions with...

lmExact {reverseR}

R Documentation

Create random values that deliver linear regressions with exact parameters

Description

Takes self-supplied x/y values or x/random values and transforms these as to deliver linear regressions y = \beta_0 + \beta_1x + \varepsilon (with potential replicates) with either

1) exact slope \beta_1 and intercept \beta_0,
2) exact p-value and intercept \beta_0, or
3) exact R^2 and intercept \beta_0.

Intended for testing and education, not for cheating ! ;-)

Usage

lmExact(x = 1:20, y = NULL, ny = 1, intercept = 0, slope = 0.1, error = 0.1, 
        seed = 123, pval = NULL, rsq = NULL, plot = TRUE, verbose = FALSE, ...)

Arguments

`x`	the predictor values.
`y`	`NULL`. A possible vector of `y` values with `length(x)`.
`ny`	the number of replicate response values per predictor value.
`intercept`	the desired intercept `\beta_0`.
`slope`	the desired slope `\beta_1`.
`error`	if a single value, the standard deviation `\sigma` for sampling from a normal distribution, or a user-supplied vector of length `x` with random deviates.
`seed`	the random generator seed for reproducibility.
`pval`	the desired p-value of the slope.
`rsq`	the desired `R^2`.
`plot`	logical. If `TRUE`, the linear regression is plotted.
`verbose`	logical. If `TRUE`, a summary is printed to the console.
`...`	other arguments to `lm` or `plot`.

Details

For case 1), the error values are added to the exact (x_i, \beta_0 + \beta_1 x_i) values, the linear model y_i = \beta_0 + \beta_1 x_i + \varepsilon is fit, and the residuals y_i - \hat{y_i} are re-added to (x_i, \beta_0 + \beta_1 x_i).
For case 2), the same as in 1) is conducted, however the slope delivering the desired p-value is found by an optimizing algorithm.
Finally, for case 3), a QR reconstruction, rescaling and refitting is conducted, using the code found under 'References'.

If y is supplied, changes in slope, intercept and p-value will deliver the sames residuals as the linear regression through x and y. A different R^2 will change the response value structure, however.

Value

A list with the following items:

`lm`	the linear model of class `lm`.
`x`	the predictor values.
`y`	the (random) response values.
`summary`	the model summary for quick checking of obtained parameters.

Using both x and y will give a linear regression with the desired parameter values when refitted.

Author(s)

Andrej-Nikolai Spiess

References

For method 3):
http://stats.stackexchange.com/questions/15011/generate-a-random-variable-with-a-defined-correlation-to-an-existing-variable.

Examples

## No replicates, intercept = 3, slope = 0.2, sigma = 2, n = 20.
res1 <- lmExact(x = 1:20, ny = 1, intercept = 3, slope = 2, error = 2)

## Same as above, but with 3 replicates, sigma = 1,  n = 20.
res2 <- lmExact(x = 1:20, ny = 3, intercept = 3, slope = 2, error = 1)

## No replicates, intercept = 2 and p-value = 0.025, sigma = 3, n = 50.
## => slope = 0.063
res3 <- lmExact(x = 1:50, ny = 1, intercept = 2, pval = 0.025, error = 3)

## 5 replicates, intercept = 1, R-square = 0.85, sigma = 2, n = 10.
## => slope = 0.117
res4 <- lmExact(x = 1:10, ny = 5, intercept = 1, rsq = 0.85, error = 2)

## Heteroscedastic (magnitude-dependent) noise.
error <- sapply(1:20, function(x) rnorm(3, 0, x/10))
res5 <- lmExact(x = 1:20, ny = 3, intercept = 1, slope = 0.2,
                error = error)
                
## Supply own x/y values, residuals are similar to an
## initial linear regression.
X <- c(1.05, 3, 5.2, 7.5, 10.2, 11.7)
set.seed(123)
Y <- 0.5 + 2 * X + rnorm(6, 0, 2)
res6 <- lmExact(x = X, y = Y, intercept = 1, slope = 0.2)
all.equal(residuals(lm(Y ~ X)), residuals(res6$lm))

[Package reverseR version 0.1 Index]