path_coeff {metan}R Documentation

Path coefficients with minimal multicollinearity




  pred = everything(),
  by = NULL,
  exclude = FALSE,
  correction = NULL,
  knumber = 50,
  brutstep = FALSE,
  maxvif = 10,
  missingval = "pairwise.complete.obs",
  plot_res = FALSE,
  verbose = TRUE,

path_coeff_mat(cor_mat, resp, correction = NULL, knumber = 50, verbose = TRUE)

path_coeff_seq(.data, resp, chain_1, chain_2, by = NULL, verbose = TRUE, ...)



The data. Must be a data frame or a grouped data passed from dplyr::group_by()


<tidy-select> The dependent trait.


<tidy-select> The predictor traits. set to everything(), i.e., the predictor traits are all the numeric traits in the data except that in resp. To select multiple traits, use a comma-separated vector of names, (e.g., pred = c(V1, V2, V2)), an interval of trait names, (e.g., pred = c(V1:V3)), or even a select helper (e.g., pred = starts_with("V")).


One variable (factor) to compute the function by. It is a shortcut to dplyr::group_by(). To compute the statistics by more than one grouping variable use that function.


Logical argument, set to false. If exclude = TRUE, then the traits in pred are deleted from the data, and the analysis will use as predictor those that remained, except that in resp.


Set to NULL. A correction value (k) that will be added into the diagonal elements of the X'X matrix aiming at reducing the harmful problems of the multicollinearity in path analysis (Olivoto et al., 2017)


When correction = NULL, a plot showing the values of direct effects in a set of different k values (0-1) is produced. knumber is the number of k values used in the range of 0 to 1.


Logical argument, set to FALSE. If true, then an algorithm will select a subset of variables with minimal multicollinearity and fit a set of possible models. See the Details section for more information.


The maximum value for the Variance Inflation Factor (cut point) that will be accepted. See the Details section for more information.


How to deal with missing values. For more information, please see stats::cor().


If TRUE, create a scatter plot of residual against predicted value and a normal Q-Q plot.


If verbose = TRUE then some results are shown in the console.


Depends on the function used:

  • For path_coeff() additional arguments passed on to stats::plot.lm().

  • For path_coeff_seq() additional arguments passed on to path_coeff.


Matrix of correlations containing both dependent and independent traits.

chain_1, chain_2

<tidy-select> The traits used in the first (primary) and second (secondary) chain.


In path_coeff(), when brutstep = TRUE, an algorithm to select a set of predictors with minimal multicollinearity and high explanatory power is implemented. first, the algorithm will select a set of predictors with minimal multicollinearity. The selection is based on the variance inflation factor (VIF). An iterative process is performed until the maximum VIF observed is less than maxvif. The variables selected in this iterative process are then used in a series of stepwise-based regressions. The first model is fitted and p-1 predictor variables are retained (p is the number of variables selected in the iterative process. The second model adjusts a regression considering p-2 selected variables, and so on until the last model, which considers only two variables. Three objects are created. Summary, with the process summary, Models, containing the aforementioned values for all the adjusted models; and Selectedpred, a vector with the name of the selected variables in the iterative process.


Depends on the function used:


Tiago Olivoto


Olivoto, T., V.Q. Souza, M. Nardino, I.R. Carvalho, M. Ferrari, A.J. Pelegrin, V.J. Szareski, and D. Schmidt. 2017. Multicollinearity in path analysis: a simple method to reduce its effects. Agron. J. 109:131-142. doi:10.2134/agronj2016.04.0196

Olivoto, T., M. Nardino, I.R. Carvalho, D.N. Follmann, M. Ferrari, et al. 2017. REML/BLUP and sequential path analysis in estimating genotypic values and interrelationships among simple maize grain yield-related traits. Genet. Mol. Res. 16(1): gmr16019525. doi:10.4238/gmr16019525



# Using KW as the response variable and all other ones as predictors
pcoeff <- path_coeff(data_ge2, resp = KW)

# The same as above, but using the correlation matrix
cor_mat <- cor(data_ge2 %>% select_numeric_cols())
pcoeff2 <- path_coeff_mat(cor_mat, resp = KW)

# Declaring the predictors
# Create a residual plot with 'plot_res = TRUE'
pcoeff3<- path_coeff(data_ge2,
                      resp = KW,
                      pred = c(PH, EH, NKE, TKW),
                      plot_res = TRUE)

# Selecting a set of predictors with minimal multicollinearity
# Maximum variance Inflation factor of 5
pcoeff4 <- path_coeff(data_ge2,
                     resp = KW,
                     brutstep = TRUE,
                     maxvif = 5)

# When one analysis should be carried out for each environment
# Using the forward-pipe operator %>%
pcoeff5 <- path_coeff(data_ge2, resp = KW, by = ENV)

# sequential path analysis
# KW as dependent trait
# NKE and TKW as primary predictors
# PH, EH, EP, and EL as secondary traits
pcoeff6 <-
               resp = KW,
               chain_1 = c(NKE, TKW),
               chain_2 = c(PH, EH, EP, EL))

[Package metan version 1.18.0 Index]