R: twophase

twophase {forestinventory}

R Documentation

twophase

Description

twophase is used to calculate estimations based on double sampling under the model-assisted Monte Carlo approach. A first phase of auxiliary information (e.g. taken from remote sensing data) is used to generate model predictions based on multiple linear regression using the method of ordinary least squares. A subsample of the first phase comprises the second phase which contains terrestrial observations (i.e. the local densities of the ground truth) that is used to correct for bias in the design-based sense. The estimation method is available for simple and cluster sampling and includes the special case where the first phase is based on an exhaustive sample (i.e. a census). Small-area applications are supported for synthetic estimation as well as two varieties of bias-corrected estimators: the traditional small-area estimator and an asymptotically equivalent version derived under Mandallaz' extended model approach.

Usage

twophase(
  formula,
  data,
  phase_id,
  cluster = NA,
  small_area = list(sa.col = NA, areas = NA, unbiased = TRUE),
  boundary_weights = NA,
  exhaustive = NA,
  progressbar = FALSE,
  psmall = FALSE
)

Arguments

`formula`	an object of class "`formula`" as would be used in the function `lm`
`data`	a data frame containing all variables contained in `formula` and a column indexing phase membership. Additional columns designating small-area membership, cluster ID and boundary weights should also be contained in the data frame if they are requested in the function.
`phase_id`	an object of class "`list`" containing two elements: `phase.col`: the column name in `data` that specifies the phase membership of each observation `terrgrid.id`: the indicator identifying the terrestrial (a.k.a. "ground truth") phase for that column (must be of type "`numeric`")
`cluster`	(Optional) Specifies the column name in `data` containing the cluster ID. Only used in case of cluster sampling.
`small_area`	(Optional) a list that if containing three elements: `sa.col`: the column name in `data` containing domain identification `areas`: vector of desired small-area domain identifiers `unbiased`: an object of type "`logical`" that when FALSE designates that the estimator is allowed to be biased (i.e. the synthetic estimator) and when TRUE forces it to be design-unbiased. See 'Details'. Note: If `small_area` is left unchanged then `twophase` defaults to global estimation.
`boundary_weights`	(Optional) Specifies the column name in `data` containing the weights for boundary adjustment. See 'Details'
`exhaustive`	(Optional) For global estimation, a vector of true auxiliary means corresponding to an exhaustive first phase. The vector must be input in the same order that `lm` processes a `formula` object and include the intercept term. For small area estimation, `exhaustive` is a `data.frame` containing column names (`colnames`) for every variable appearing in the parameter `formula` including the variable "Intercept".Rownames (`row.names`) have to be used and must correspond to the names of the small areas. See 'Details'.
`progressbar`	(Optional) an object a type "`logical`" that when TRUE prints the progress of the calculation in the console (recommended for large amount of small areas). Defaults to FALSE.
`psmall`	(Optional) an object a type "`logical`" used for small area estimations that only works when `unbiased` in the parameter `small_area` is set to TRUE. See 'Details'.

Details

If estimations for multiple small-area domains should be computed, the domains have to be defined within a character vector using c(). Using small_area(..., unbiased=FALSE) calculates design-based estimates with the synthetic estimator and may be design-biased if the model is biased in that small area. The default, small_area(..., unbiased=TRUE), allows for a residual correction by one of two asymptotically equivalent methods to create design-unbiased estimates:

Mandallaz' extended model approach calculates the residual correction by extending the model formula with an indicator variable in the small area. It is the default method psmall=FALSE.
the traditional small area estimator calculates the residual correction by taking the synthetic estimator and adding the mean residual observed in the small area. It is activated when psmall=TRUE.

Missing values (NA) in the auxiliary variables (i.e. at least one auxiliary variable cannot be observed at an inventory location) are automatically removed from the dataset before the estimations are computed. Note that missingness in the auxiliary variables is only allowed if we assume that they are missing at random, since the unbiasedness of the estimates is based on the sampling design.

The boundary weight adjustment is pertinent for auxiliary information derived from remote sensing and is equal to the percentage of forested area (e.g. as defined by a forest mask) in the interpretation area.

Exhaustive estimation refers to when the true means of certain auxiliary variables are known and an exhaustive first phase (i.e. a census). For global estimation, the vector must be input in the same order that lm processes a formula object including the intercept term whose true mean will always be one. For small area estimation, exhaustive is a data.frame containing column names for every variable appearing in the parameter formula including the variable "Intercept". The observations of the data.frame must represent the true auxiliary means in the same order as was presented in areas from the parameter small_area. See 'Examples'.

Value

twophase returns an object of class "twophase".

An object of class "twophase" returns a list of the following components:

`input`	a `list` containing the function's inputs
`estimation`	a data frame containing the following components: `area:` the domain (only present if argument `areas` has been used) `estimate:` the point estimate `ext_variance:` the external variance of the point estimate that doesn't account for fitting the model from the current inventory `g_variance:` the internal (g-weight) variance that accounts for fitting the model from the current inventory `n1` the first phase sample size of plots `n2` the second phase (i.e. terrestrial) sample size of plots `n1G` the first phase sample size in the small area `n2G` the second phase (i.e. terrestrial) sample size in the small area `r.squared` the R squared of the linear model
`samplesizes`	a `data.frame` summarizing all samplesizes: in case of cluster sampling both, the number of individual plots and the number of clusters is reported.
`coefficients`	the linear model coefficients
`cov_coef`	the design-based covariance matrix of the model coefficients
`Z_bar_1G`	the estimated auxiliary means of `formula` based on the first phase. If the first phase is exhaustive, these are the true auxiliary means specified in the input-argument `exhaustive`.
`cov_Z_bar_1G`	the covariance matrix of `Z_bar_1G`
`Rc_x_hat_G`	the small-area residuals at either the plot level or cluster level depending on the call
`Rc_x_hat`	the residuals at either the plot level or cluster level depending on the call
`Yx_s2G`	the local densities in the small area
`Mx_s2G`	the cluster weights in the small area
`mean_Rc_x_hat_G`	the mean residual (weighted mean in the case of cluster sampling) in the small area
`mean_Rc_x_hat`	the mean residual (weighted mean in the case of cluster sampling)
`warn.messages`	logical indicating if warning messages were issued

Note

In the special case of cluster sampling, the reported sample sizes in estimation are the number of clusters. The samplesize-object also provides the respective number of single plot units for cluster sampling. The reported r.squared describe the model fit of the applied linear regression model (i.e. on plot-level, not on cluster level).

References

Hill, A., Massey, A. F. (2021). The R Package forestinventory: Design-Based Global and Small Area Estimations for Multiphase Forest Inventories. Journal of Statistical Software, 97(4), 1-40.

Mandallaz, D. (2007). Sampling techniques for forest inventories. Chapter 4. CRC Press.

Mandallaz, D. (2013). Design-based properties of some small-area estimators in forest inventory with two-phase sampling. Can. J. For. Res. 43: 441-449

Mandallaz, D. and Hill, A. and Massey, A. (2016). Design-based properties of some small-area estimators in forest inventory with two-phase sampling. ETH Zurich, Department of Environmental Systems Science,Tech. rep. Available from http://e-collection.library.ethz.ch.

Examples


## load datasets:
data(grisons)
data(zberg)

# ------------------------------------------------#
# ----------- GLOBAL ESTIMATION ------------------#

#----
## 1) -- Design-based estimation with non-exhaustive auxiliary information
#----

# 1.1) non-cluster-sampling:
summary(twophase(formula = tvol ~mean + stddev + max + q75,
                 data = grisons,
                 phase_id = list(phase.col = "phase_id_2p", terrgrid.id = 2)))

# 1.2) cluster-sampling (see eqns. [57] and [58] in Mandallaz, Hill, Massey 2016):
summary(twophase(formula = basal ~ stade + couver + melange,
                data = zberg,
                phase_id = list(phase.col = "phase_id_2p", terrgrid.id = 2),
                cluster = "cluster"))

# 1.3) example for boundary weight adjustment (non-cluster example):
summary(twophase(formula=tvol ~ mean + stddev + max + q75,
                 data=grisons,
                 phase_id=list(phase.col = "phase_id_2p", terrgrid.id = 2),
                 boundary_weights = "boundary_weights"))

#----
## 2) -- Design-based estimation with exhaustive auxiliary information
#----

# establish order for vector of true auxiliary means:
colnames(lm(formula = tvol ~ mean + stddev + max + q75, data = grisons, x = TRUE)$x)
true.means <- c(1, 11.39, 8.84, 32.68, 18.03)

# 2.1) non-cluster-sampling:
summary(twophase(formula = tvol ~ mean + stddev + max + q75,
                 data = grisons,
                 phase_id = list(phase.col = "phase_id_2p", terrgrid.id = 2),
                 exhaustive = true.means))

# 2.2) cluster-sampling:
summary(twophase(formula = stem ~ stade + couver + melange,
                 data = zberg,
                 phase_id = list(phase.col = "phase_id_2p", terrgrid.id = 2),
                 cluster = "cluster",
                 exhaustive = c(1, 0.10, 0.7, 0.10, 0.6, 0.8)))


# ----------------------------------------------------#
# ----------- SMALL AREA ESTIMATION ------------------#

#----
## 1) -- Design-based estimation with non-exhaustive auxiliary information
#----

# 1.1) Mandallaz's extended pseudo small area estimator (see eqns. [35] and [36] in Mandallaz 2013):
summary(twophase(formula = tvol ~ mean + stddev + max + q75, data = grisons,
                 phase_id = list(phase.col = "phase_id_2p", terrgrid.id = 2),
                 small_area = list(sa.col = "smallarea", areas = c("A", "B","C", "D"),
                                   unbiased = TRUE)))

summary(twophase(formula = basal ~ stade + couver + melange, data=zberg,
                 phase_id = list(phase.col = "phase_id_2p", terrgrid.id = 2),
                 cluster = "cluster",
                 small_area = list(sa.col = "ismallg23", areas = c("2", "3"),
                                   unbiased = TRUE)))


# 1.2) pseudo small area estimator (see eqns. [25] and [26] in Mandallaz 2013):
summary(twophase(formula = tvol ~ mean + stddev + max + q75, data = grisons,
                 phase_id = list(phase.col = "phase_id_2p", terrgrid.id = 2),
                 small_area = list(sa.col = "smallarea", areas = c("A", "B"),
                                   unbiased = TRUE),
                 psmall = TRUE))

summary(twophase(formula = basal ~ stade + couver + melange, data=zberg,
                 phase_id = list(phase.col = "phase_id_2p", terrgrid.id = 2),
                 cluster = "cluster",
                 small_area = list(sa.col = "ismallg23", areas = c("2", "3"),
                                   unbiased = TRUE),
                 psmall = TRUE))


# 1.3) pseudosynthetic small area estimator (see eqns. [35] and [36] in Mandallaz 2013):
summary(twophase(formula = tvol ~ mean + stddev + max + q75, data=grisons,
                 phase_id = list(phase.col = "phase_id_2p", terrgrid.id = 2),
                 small_area = list(sa.col = "smallarea", areas = c("B", "A"),
                                   unbiased = FALSE)))

summary(twophase(formula = basal ~ stade + couver + melange, data=zberg,
                 phase_id = list(phase.col = "phase_id_2p", terrgrid.id = 2),
                 cluster = "cluster",
                 small_area = list(sa.col = "ismallg23", areas = c("2", "3"),
                                   unbiased = FALSE)))


#----
## 2) -- Design-based estimation with exhaustive auxiliary information
#----

# establish order for vector of true auxiliary means:
colnames(lm(formula = tvol ~ mean + stddev + max + q75, data = grisons, x = TRUE)$x)
colnames(lm(formula = basal ~ stade + couver + melange, data = zberg, x = TRUE)$x)

# true auxiliary means taken from Mandallaz et al. (2013):
truemeans.G <- data.frame(Intercept = rep(1, 4),
                         mean = c(12.85, 12.21, 9.33, 10.45),
                         stddev = c(9.31, 9.47, 7.90, 8.36),
                         max = c(34.92, 35.36, 28.81, 30.22),
                         q75 = c(19.77, 19.16, 15.40, 16.91))
rownames(truemeans.G) <- c("A", "B", "C", "D")

# true auxiliary means taken from Mandallaz (1991):
truemeans.G.clust <- data.frame(Intercept = 1,
                               stade400 = 0.175,
                               stade500 = 0.429,
                               stade600 = 0.321,
                               couver2 = 0.791,
                               melange2 = 0.809)
rownames(truemeans.G.clust) <- c("1")


# 2.1) Mandallaz's extended small area estimator (see eqns. [31] and [33] in Mandallaz 2013):
summary(twophase(formula = tvol ~ mean + stddev + max + q75, data = grisons,
                 phase_id = list(phase.col = "phase_id_2p", terrgrid.id = 2),
                 small_area = list(sa.col ="smallarea", areas = c("A", "B"),
                                   unbiased = TRUE),
                 exhaustive = truemeans.G))

summary(twophase(formula = basal ~ stade + couver + melange, data=zberg,
                 phase_id = list(phase.col = "phase_id_2p", terrgrid.id = 2),
                 cluster = "cluster",
                 small_area = list(sa.col = "ismallold", areas = c("1"),
                                   unbiased = TRUE),
                 exhaustive = truemeans.G.clust))


# 2.2) small area estimator (see eqns. [20] and [21] in Mandallaz 2013):
summary(twophase(formula = tvol ~ mean + stddev + max + q75, data = grisons,
                 phase_id = list(phase.col = "phase_id_2p", terrgrid.id = 2),
                 small_area = list(sa.col = "smallarea", areas = c("A"),
                                   unbiased = TRUE),
                 exhaustive = truemeans.G, psmall = TRUE))

summary(twophase(formula = basal ~ stade + couver + melange, data = zberg,
                 phase_id = list(phase.col = "phase_id_2p", terrgrid.id = 2),
                 cluster = "cluster",
                 small_area = list(sa.col ="ismallold", areas = c("1"),
                                   unbiased = TRUE),
                 psmall = TRUE,
                 exhaustive = truemeans.G.clust))


# 2.3) synthetic small area estimator (see eqns. [18] and [19] in Mandallaz 2013):
summary(twophase(formula=tvol ~ mean + stddev + max + q75, data=grisons,
                 phase_id=list(phase.col = "phase_id_2p", terrgrid.id = 2),
                 small_area=list(sa.col = "smallarea", areas = c("A", "B"),
                                 unbiased = FALSE),
                 exhaustive = truemeans.G))

summary(twophase(formula = basal ~ stade + couver + melange, data = zberg,
                 phase_id = list(phase.col = "phase_id_2p", terrgrid.id = 2),
                 cluster = "cluster",
                 small_area = list(sa.col = "ismallold", areas = c("1"),
                                   unbiased = FALSE),
                 exhaustive = truemeans.G.clust))

[Package forestinventory version 1.0.0 Index]