phenoRegressor.RFR {GROAN} | R Documentation |
Random Forest Regression using package randomForest
Description
This is a wrapper around randomForest and related functions.
As such, this function will not work if randomForest package is not installed.
There is no distinction between regular covariates (genotypes) and extra
covariates (fixed effects) in random forest. If extra covariates are passed, they are
put together with genotypes, side by side. Same thing happens with covariances matrix. This
can bring to the scientifically questionable but technically correct situation of regressing
on a big matrix made of SNP genotypes, covariances and other covariates, all collated side by side.
The function makes no distinction, and it's up to the user understand what is correct in each
specific experiment.
WARNING: this function can be *very* slow, especially when called on thousands of SNPs.
Usage
phenoRegressor.RFR(
phenotypes,
genotypes,
covariances,
extraCovariates,
ntree = ceiling(length(phenotypes)/5),
...
)
Arguments
phenotypes |
phenotypes, a numeric array (n x 1), missing values are predicted |
genotypes |
SNP genotypes, one row per phenotype (n), one column per marker (m), values in 0/1/2 for
diploids or 0/1/2/...ploidy for polyploids. Can be NULL if |
covariances |
square matrix (n x n) of covariances. Can be NULL if |
extraCovariates |
extra covariates set, one row per phenotype (n), one column per covariate (w). If NULL no extra covariates are considered. |
ntree |
number of trees to grow, defaults to a fifth of the number of samples (rounded
up). As per |
... |
any extra parameter is passed to |
Value
The function returns a list with the following fields:
-
predictions
: an array of (k) predicted phenotypes -
hyperparams
: named vector with the following keys: ntree (number of grown trees) and mtry (number of variables randomly sampled as candidates at each split) -
extradata
: the object returned byrandomForest::randomForest()
, containing the full trained forest and the used parameters
See Also
Other phenoRegressors:
phenoRegressor.BGLR()
,
phenoRegressor.SVR()
,
phenoRegressor.dummy()
,
phenoRegressor.rrBLUP()
,
phenoregressor.BGLR.multikinships()
Examples
## Not run:
#using the GROAN.KI dataset, we regress on the dataset and predict the first ten phenotypes
phenos = GROAN.KI$yield
phenos[1:10] = NA
#calling the regressor with random forest
results = phenoRegressor.RFR(
phenotypes = phenos,
genotypes = GROAN.KI$SNPs,
covariances = NULL,
extraCovariates = NULL,
ntree = 20,
mtry = 200 #randomForest-specific parameters
)
#examining the predictions
plot(GROAN.KI$yield, results$predictions,
main = 'Train set (black) and test set (red) regressions',
xlab = 'Original phenotypes', ylab = 'Predicted phenotypes')
points(GROAN.KI$yield[1:10], results$predictions[1:10], pch=16, col='red')
#printing correlations
test.set.correlation = cor(GROAN.KI$yield[1:10], results$predictions[1:10])
train.set.correlation = cor(GROAN.KI$yield[-(1:10)], results$predictions[-(1:10)])
writeLines(paste(
'test-set correlation :', test.set.correlation,
'\ntrain-set correlation:', train.set.correlation
))
## End(Not run)