path_coeff {metan} | R Documentation |
Path coefficients with minimal multicollinearity
Description
-
path_coeff()
computes a path analysis using a data frame as input data. -
path_coeff_seq()
computes a sequential path analysis using primary and secondary traits. -
path_coeff_mat()
computes a path analysis using correlation matrices as input data.
Usage
path_coeff(
.data,
resp,
pred = everything(),
by = NULL,
exclude = FALSE,
correction = NULL,
knumber = 50,
brutstep = FALSE,
maxvif = 10,
missingval = "pairwise.complete.obs",
plot_res = FALSE,
verbose = TRUE,
...
)
path_coeff_mat(cor_mat, resp, correction = NULL, knumber = 50, verbose = TRUE)
path_coeff_seq(.data, resp, chain_1, chain_2, by = NULL, verbose = TRUE, ...)
Arguments
.data |
The data. Must be a data frame or a grouped data passed from
|
resp |
< |
pred |
< |
by |
One variable (factor) to compute the function by. It is a shortcut
to |
exclude |
Logical argument, set to false. If |
correction |
Set to |
knumber |
When |
brutstep |
Logical argument, set to |
maxvif |
The maximum value for the Variance Inflation Factor (cut point) that will be accepted. See the Details section for more information. |
missingval |
How to deal with missing values. For more information,
please see |
plot_res |
If |
verbose |
If |
... |
Depends on the function used:
|
cor_mat |
Matrix of correlations containing both dependent and independent traits. |
chain_1 , chain_2 |
< |
Details
In path_coeff()
, when brutstep = TRUE
, an algorithm to
select a set of predictors with minimal multicollinearity and high
explanatory power is implemented. first, the algorithm will select a set of
predictors with minimal multicollinearity. The selection is based on the
variance inflation factor (VIF). An iterative process is performed until
the maximum VIF observed is less than maxvif
. The variables selected
in this iterative process are then used in a series of stepwise-based
regressions. The first model is fitted and p-1 predictor variables are
retained (p is the number of variables selected in the iterative process.
The second model adjusts a regression considering p-2 selected variables,
and so on until the last model, which considers only two variables. Three
objects are created. Summary
, with the process summary,
Models
, containing the aforementioned values for all the adjusted
models; and Selectedpred
, a vector with the name of the selected
variables in the iterative process.
Value
Depends on the function used:
-
path_coeff()
, returns a list with the following items:-
Corr.x A correlation matrix between the predictor variables.
-
Corr.y A vector of correlations between each predictor variable with the dependent variable.
-
Coefficients The path coefficients. Direct effects are the diagonal elements, and the indirect effects those in the off-diagonal elements (lines).
-
Eigen Eigenvectors and eigenvalues of the
Corr.x.
-
VIF The Variance Inflation Factors.
-
plot A ggplot2-based graphic showing the direct effects in 21 different k values.
-
Predictors The predictor variables used in the model.
-
CN The Condition Number, i.e., the ratio between the highest and lowest eigenvalue.
-
Det The matrix determinant of the
Corr.x.
. -
R2 The coefficient of determination of the model.
-
Residual The residual effect of the model.
-
Response The response variable.
-
weightvar The order of the predictor variables with the highest weight (highest eigenvector) in the lowest eigenvalue.
-
-
path_coeff_seq()
returns a list with the following objects-
resp_fc an object of class
path_coeff
with the results for the analysis with dependent trait and first chain predictors. -
resp_sc an object of class
path_coeff
with the results for the analysis with dependent trait and second chain predictors. -
resp_sc2 The path coefficients of second chain predictors and the dependent trait through the first chain predictors
-
fc_sc_list A list of objects with the path analysis using each trait in the first chain as dependent and second chain as predictors.
-
fc_sc_coef The coefficients between first- and second-chain traits.
-
cor_mat A correlation matrix between the analyzed traits. If
.data
is a grouped data passed fromdplyr::group_by()
then the results will be returned into a list-column of data frames.
-
Author(s)
Tiago Olivoto tiagoolivoto@gmail.com
References
Olivoto, T., V.Q. Souza, M. Nardino, I.R. Carvalho, M. Ferrari, A.J. Pelegrin, V.J. Szareski, and D. Schmidt. 2017. Multicollinearity in path analysis: a simple method to reduce its effects. Agron. J. 109:131-142. doi:10.2134/agronj2016.04.0196
Olivoto, T., M. Nardino, I.R. Carvalho, D.N. Follmann, M. Ferrari, et al. 2017. REML/BLUP and sequential path analysis in estimating genotypic values and interrelationships among simple maize grain yield-related traits. Genet. Mol. Res. 16(1): gmr16019525. doi:10.4238/gmr16019525
Examples
library(metan)
# Using KW as the response variable and all other ones as predictors
pcoeff <- path_coeff(data_ge2, resp = KW)
# The same as above, but using the correlation matrix
cor_mat <- cor(data_ge2 %>% select_numeric_cols())
pcoeff2 <- path_coeff_mat(cor_mat, resp = KW)
# Declaring the predictors
# Create a residual plot with 'plot_res = TRUE'
pcoeff3<- path_coeff(data_ge2,
resp = KW,
pred = c(PH, EH, NKE, TKW),
plot_res = TRUE)
# Selecting a set of predictors with minimal multicollinearity
# Maximum variance Inflation factor of 5
pcoeff4 <- path_coeff(data_ge2,
resp = KW,
brutstep = TRUE,
maxvif = 5)
# When one analysis should be carried out for each environment
# Using the forward-pipe operator %>%
pcoeff5 <- path_coeff(data_ge2, resp = KW, by = ENV)
# sequential path analysis
# KW as dependent trait
# NKE and TKW as primary predictors
# PH, EH, EP, and EL as secondary traits
pcoeff6 <-
path_coeff_seq(data_ge2,
resp = KW,
chain_1 = c(NKE, TKW),
chain_2 = c(PH, EH, EP, EL))
pcoeff6$resp_sc$Coefficients
pcoeff6$resp_sc2