This is the multi-page printable view of this section. Click here to print.

Return to the regular view of this page.

Random forest for regression

The Random Forest for regression algorithm creates an ensemble model of regression trees.

The Random Forest for regression algorithm creates an ensemble model of regression trees. Each tree is trained on a randomly selected subset of the training data. The algorithm predicts the value that is the mean prediction of the individual trees.

You can use the following functions to train the Random Forest model, and use the model to make predictions on a set of test data:

For a complete example of how to use the Random Forest for regression algorithm in Vertica, see Building a random forest regression model.

1 - Building a random forest regression model

This example uses the "mtcars" dataset to create a random forest model to predict the value of carb (the number of carburetors).

This example uses the "mtcars" dataset to create a random forest model to predict the value of carb (the number of carburetors).

Before you begin the example, load the Machine Learning sample data.
  1. Use RF_REGRESSOR to create the random forest model myRFRegressorModel using the mtcars training data. View the summary output of the model with GET_MODEL_SUMMARY:

    => SELECT RF_REGRESSOR ('myRFRegressorModel', 'mtcars', 'carb', 'mpg, cyl, hp, drat, wt' USING PARAMETERS
    ntree=100, sampling_size=0.3);
    RF_REGRESSOR
    --------------
    Finished
    (1 row)
    
    
    => SELECT GET_MODEL_SUMMARY(USING PARAMETERS model_name='myRFRegressorModel');
    --------------------------------------------------------------------------------
    ===========
    call_string
    ===========
    SELECT rf_regressor('public.myRFRegressorModel', 'mtcars', '"carb"', 'mpg, cyl, hp, drat, wt'
    USING PARAMETERS exclude_columns='', ntree=100, mtry=1, sampling_size=0.3, max_depth=5, max_breadth=32,
    min_leaf_size=5, min_info_gain=0, nbins=32);
    
    =======
    details
    =======
    predictor|type
    ---------+-----
    mpg      |float
    cyl      | int
    hp       | int
    drat     |float
    wt       |float
    ===============
    Additional Info
    ===============
    Name              |Value
    ------------------+-----
    tree_count        | 100
    rejected_row_count|  0
    accepted_row_count| 32
    (1 row)
    
  2. Use PREDICT_RF_REGRESSOR to predict the number of carburetors:

    => SELECT PREDICT_RF_REGRESSOR (mpg,cyl,hp,drat,wt
    USING PARAMETERS model_name='myRFRegressorModel') FROM mtcars;
    PREDICT_RF_REGRESSOR
    ----------------------
    2.94774203574204
    2.6954087024087
    2.6954087024087
    2.89906346431346
    2.97688489288489
    2.97688489288489
    2.7086587024087
    2.92078965478965
    2.97688489288489
    2.7086587024087
    2.95621822621823
    2.82255155955156
    2.7086587024087
    2.7086587024087
    2.85650394050394
    2.85650394050394
    2.97688489288489
    2.95621822621823
    2.6954087024087
    2.6954087024087
    2.84493251193251
    2.97688489288489
    2.97688489288489
    2.8856467976468
    2.6954087024087
    2.92078965478965
    2.97688489288489
    2.97688489288489
    2.7934087024087
    2.7934087024087
    2.7086587024087
    2.72469441669442
    (32 rows)