R: Calculates the Best-fit Straight Line

bfsl {bfsl}

R Documentation

Calculates the Best-fit Straight Line

Description

bfsl calculates the best-fit straight line to independent points with (possibly correlated) normally distributed errors in both coordinates.

Usage

bfsl(...)

## Default S3 method:
bfsl(x, y = NULL, sd_x = 0, sd_y = 1, r = 0, control = bfsl_control(), ...)

## S3 method for class 'formula'
bfsl(
  formula,
  data = parent.frame(),
  sd_x,
  sd_y,
  r = 0,
  control = bfsl_control(),
  ...
)

Arguments

`...`	Further arguments passed to or from other methods.
`x`	A vector of x observations or a data frame (or an object coercible by `as.data.frame` to a data frame) containing the named vectors x, y, and optionally sd_x, sd_y and r. If weights w_x and w_y are given, then sd_x and sd_y are calculated from sd_x = 1/sqrt(w_x) and sd_y = 1/sqrt(w_y). Specifying `y`, `sd_x`, `sd_y` or `r` directly as function arguments overwrites these variables in the data structure.
`y`	A vector of y observations.
`sd_x`	A vector of x measurement error standard deviations. If it is of length one, all data points are assumed to have the same x standard deviation.
`sd_y`	A vector of y measurement error standard deviations. If it is of length one, all data points are assumed to have the same y standard deviation.
`r`	A vector of correlation coefficients between errors in x and y. If it is of length one, all data points are assumed to have the same correlation coefficient.
`control`	A list of control settings. See `bfsl_control` for the names of the settable control values and their effect.
`formula`	A formula specifying the bivariate model (as in `lm`, but here only `y ~ x` makes sense).
`data`	A data.frame containing the variables of the model.

Details

bfsl provides the general least-squares estimation solution to the problem of fitting a straight line to independent data with (possibly correlated) normally distributed errors in both x and y.

With sd_x = 0 the (weighted) ordinary least squares solution is obtained. The calculated standard errors of the slope and intercept multiplied with sqrt(chisq) correspond to the ordinary least squares standard errors.

With sd_x = c, sd_y = d, where c and d are positive numbers, and r = 0 the Deming regression solution is obtained. If additionally c = d, the orthogonal distance regression solution, also known as major axis regression, is obtained.

Setting sd_x = sd(x), sd_y = sd(y) and r = 0 leads to the geometric mean regression solution, also known as reduced major axis regression or standardised major axis regression.

The goodness of fit metric chisq is a weighted reduced chi-squared statistic. It compares the deviations of the points from the fit line to the assigned measurement error standard deviations. If x and y are indeed related by a straight line, and if the assigned measurement errors are correct (and normally distributed), then chisq will equal 1. A chisq > 1 indicates underfitting: the fit does not fully capture the data or the measurement errors have been underestimated. A chisq < 1 indicates overfitting: either the model is improperly fitting noise, or the measurement errors have been overestimated.

Value

An object of class "bfsl", which is a list containing the following components:

`coefficients`	A `2x2` matrix with columns of the fitted coefficients (intercept and slope) and their standard errors.
`chisq`	The goodness of fit (see Details).
`fitted.values`	The fitted mean values.
`residuals`	The residuals, that is `y` observations minus fitted values.
`df.residual`	The residual degrees of freedom.
`cov.ab`	The covariance of the slope and intercept.
`control`	The control `list` used, see the `control` argument.
`convInfo`	A `list` with convergence information.
`call`	The matched call.
`data`	A `list` containing `x`, `y`, `sd_x`, `sd_y` and `r`.

References

York, D. (1968). Least squares fitting of a straight line with correlated errors. Earth and Planetary Science Letters, 5, 320–324, https://doi.org/10.1016/S0012-821X(68)80059-7

Examples

x = pearson_york_data$x
y = pearson_york_data$y
sd_x = 1/sqrt(pearson_york_data$w_x)
sd_y = 1/sqrt(pearson_york_data$w_y)
bfsl(x, y, sd_x, sd_y)
bfsl(y~x, pearson_york_data, sd_x, sd_y)

fit = bfsl(pearson_york_data)
plot(fit)

[Package bfsl version 0.2.0 Index]