Fits an extremal random forest (ERF) with cross-validation.

erf_cv(
  X,
  Y,
  min.node.size = c(5, 40, 100),
  lambda = c(0, 0.001, 0.01),
  intermediate_estimator = c("grf", "neural_nets"),
  intermediate_quantile = 0.8,
  nfolds = 5,
  nreps = 3,
  seed = NULL
)

Arguments

X

Numeric matrix of predictors, where each row corresponds to an observation and each column to a predictor.

Y

Numeric vector of responses.

min.node.size

Vector with minimum number of observations in each tree leaf used to fit the similarity weights (see also grf::quantile_forest()). Nodes with size smaller than min.node.size can occur, as in the original randomForest package. Default is c(5, 40, 100).

lambda

Vector with penalties for the shape parameter used in the weighted likelihood. Default is c(0, 0.001, 0.01).

intermediate_estimator

A character specifying the estimator used to fit the intermediate threshold. Options available are:

intermediate_quantile

Intermediate quantile level, used to predict the intermediate threshold. For further information see Terefe et al. (2020) . Default is 0.8.

nfolds

Number of folds in the cross-validation scheme. Default is 5.

nreps

Number of times nfolds cross-validation is repeated. Default is 3.

seed

Random seed to reproduce the fold splits. Default is NULL.

Value

An object with S3 class "erf_cv". It is a named list with the following elements:

scores

A tibble with columns: min.node.size, lambda, cvm (mean cross-validated error).

erf

A fitted "erf" object on the full data using the optimal min.node.size and lambda.