Skip to contents

Prediction with new data and a saved forest from Ranger.

Usage

# S3 method for class 'ranger'
predict(
  object,
  data = NULL,
  predict.all = FALSE,
  num.trees = object$num.trees,
  type = "response",
  se.method = "infjack",
  quantiles = c(0.1, 0.5, 0.9),
  what = NULL,
  seed = NULL,
  num.threads = NULL,
  verbose = TRUE,
  ...
)

Arguments

object

Ranger ranger object.

data

New test data of class data.frame or gwaa.data (GenABEL).

predict.all

Return individual predictions for each tree instead of aggregated predictions for all trees. Return a matrix (sample x tree) for classification and regression, a 3d array for probability estimation (sample x class x tree) and survival (sample x time x tree).

num.trees

Number of trees used for prediction. The first num.trees in the forest are used.

type

Type of prediction. One of 'response', 'se', 'terminalNodes', 'quantiles' with default 'response'. See below for details.

se.method

Method to compute standard errors. One of 'jack', 'infjack' with default 'infjack'. Only applicable if type = 'se'. See below for details.

quantiles

Vector of quantiles for quantile prediction. Set type = 'quantiles' to use.

what

User specified function for quantile prediction used instead of quantile. Must return numeric vector, see examples.

seed

Random seed. Default is NULL, which generates the seed from R. Set to 0 to ignore the R seed. The seed is used in case of ties in classification mode.

num.threads

Number of threads. Use 0 for all available cores. Default is 2 if not set by options/environment variables (see below).

verbose

Verbose output on or off.

...

further arguments passed to or from other methods.

Value

Object of class ranger.prediction with elements

predictionsPredicted classes/values (only for classification and regression)
unique.death.timesUnique death times (only for survival).
chfEstimated cumulative hazard function for each sample (only for survival).
survivalEstimated survival function for each sample (only for survival).
num.treesNumber of trees.
num.independent.variablesNumber of independent variables.
treetypeType of forest/tree. Classification, regression or survival.
num.samplesNumber of samples.

Details

For type = 'response' (the default), the predicted classes (classification), predicted numeric values (regression), predicted probabilities (probability estimation) or survival probabilities (survival) are returned. For type = 'se', the standard error of the predictions are returned (regression only). The jackknife-after-bootstrap or infinitesimal jackknife for bagging is used to estimate the standard errors based on out-of-bag predictions. See Wager et al. (2014) for details. For type = 'terminalNodes', the IDs of the terminal node in each tree for each observation in the given dataset are returned. For type = 'quantiles', the selected quantiles for each observation are estimated. See Meinshausen (2006) for details.

If type = 'se' is selected, the method to estimate the variances can be chosen with se.method. Set se.method = 'jack' for jackknife-after-bootstrap and se.method = 'infjack' for the infinitesimal jackknife for bagging.

For classification and predict.all = TRUE, a factor levels are returned as numerics. To retrieve the corresponding factor levels, use rf$forest$levels, if rf is the ranger object.

By default, ranger uses 2 threads. The default can be changed with: (1) num.threads in ranger/predict call, (2) environment variable R_RANGER_NUM_THREADS, (3) options(ranger.num.threads = N), (4) options(Ncpus = N), with precedence in that order.

References

See also

Author

Marvin N. Wright

Examples

## Classification forest
ranger(Species ~ ., data = iris)
#> Ranger result
#> 
#> Call:
#>  ranger(Species ~ ., data = iris) 
#> 
#> Type:                             Classification 
#> Number of trees:                  500 
#> Sample size:                      150 
#> Number of independent variables:  4 
#> Mtry:                             2 
#> Target node size:                 1 
#> Variable importance mode:         none 
#> Splitrule:                        gini 
#> OOB prediction error:             4.00 % 
train.idx <- sample(nrow(iris), 2/3 * nrow(iris))
iris.train <- iris[train.idx, ]
iris.test <- iris[-train.idx, ]
rg.iris <- ranger(Species ~ ., data = iris.train)
pred.iris <- predict(rg.iris, data = iris.test)
table(iris.test$Species, pred.iris$predictions)
#>             
#>              setosa versicolor virginica
#>   setosa         14          0         0
#>   versicolor      0         13         1
#>   virginica       0          3        19

## Quantile regression forest
rf <- ranger(mpg ~ ., mtcars[1:26, ], quantreg = TRUE)
pred <- predict(rf, mtcars[27:32, ], type = "quantiles", quantiles = c(0.1, 0.5, 0.9))
pred$predictions
#>      quantile= 0.1 quantile= 0.5 quantile= 0.9
#> [1,]          21.0          24.4          32.4
#> [2,]          21.0          22.8          32.4
#> [3,]          13.3          17.3          30.4
#> [4,]          15.2          21.0          22.8
#> [5,]          13.3          14.3          19.2
#> [6,]          21.0          22.8          32.4

## Quantile regression forest with user-specified function
rf <- ranger(mpg ~ ., mtcars[1:26, ], quantreg = TRUE)
pred <- predict(rf, mtcars[27:32, ], type = "quantiles", 
                what = function(x) sample(x, 10, replace = TRUE))
pred$predictions
#>      [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
#> [1,] 30.4 30.4 30.4 21.0 27.3 32.4 27.3 30.4 24.4  21.0
#> [2,] 22.8 22.8 33.9 19.2 30.4 21.5 22.8 30.4 21.0  30.4
#> [3,] 14.3 13.3 14.3 33.9 14.7 19.2 17.8 10.4 14.3  27.3
#> [4,] 21.0 21.5 21.0 21.5 17.8 21.0 22.8 22.8 21.0  30.4
#> [5,] 14.7 13.3 13.3 14.3 10.4 15.2 14.7 18.7 19.2  14.3
#> [6,] 24.4 22.8 27.3 27.3 21.0 27.3 22.8 22.8 22.8  32.4