This is the multi-page printable view of this section.
Click here to print.
Return to the regular view of this page.
Random forest for classification
The Random Forest algorithm creates an ensemble model of decision trees.
The Random Forest algorithm creates an ensemble model of decision trees. Each tree is trained on a randomly selected subset of the training data.
You can use the following functions to train the Random Forest model, and use the model to make predictions on a set of test data:
For a complete example of how to use the Random Forest algorithm in Vertica, see Classifying data using random forest.
1 - Classifying data using random forest
This random forest example uses a data set named iris.
This random forest example uses a data set named iris. The example contains four variables that measure various parts of the iris flower to predict its species.
Before you begin the example, make sure that you have followed the steps in Download the machine learning example data.
-
Use
RF_CLASSIFIER
to create the random forest model, named rf_iris
, using the iris
data. View the summary output of the model with
GET_MODEL_SUMMARY
:
=> SELECT RF_CLASSIFIER ('rf_iris', 'iris', 'Species', 'Sepal_Length, Sepal_Width, Petal_Length, Petal_Width'
USING PARAMETERS ntree=100, sampling_size=0.5);
RF_CLASSIFIER
----------------------------
Finished training
(1 row)
=> SELECT GET_MODEL_SUMMARY(USING PARAMETERS model_name='rf_iris');
------------------------------------------------------------------------
===========
call_string
===========
SELECT rf_classifier('public.rf_iris', 'iris', '"species"', 'Sepal_Length, Sepal_Width, Petal_Length,
Petal_Width' USING PARAMETERS exclude_columns='', ntree=100, mtry=2, sampling_size=0.5, max_depth=5,
max_breadth=32, min_leaf_size=1, min_info_gain=0, nbins=32);
=======
details
=======
predictor |type
------------+-----
sepal_length|float
sepal_width |float
petal_length|float
petal_width |float
===============
Additional Info
===============
Name |Value
------------------+-----
tree_count | 100
rejected_row_count| 0
accepted_row_count| 150
(1 row)
-
Apply the classifier to the test data with
PREDICT_RF_CLASSIFIER
:
=> SELECT PREDICT_RF_CLASSIFIER (Sepal_Length, Sepal_Width, Petal_Length, Petal_Width
USING PARAMETERS model_name='rf_iris') FROM iris1;
PREDICT_RF_CLASSIFIER
-----------------------
setosa
setosa
setosa
.
.
.
versicolor
versicolor
versicolor
.
.
.
virginica
virginica
virginica
.
.
.
(90 rows)
-
Use
PREDICT_RF_CLASSIFIER_CLASSES
to view the probability of each class:
=> SELECT PREDICT_RF_CLASSIFIER_CLASSES(Sepal_Length, Sepal_Width, Petal_Length, Petal_Width
USING PARAMETERS model_name='rf_iris') OVER () FROM iris1;
predicted | probability
-----------+-------------------
setosa | 1
setosa | 1
setosa | 1
setosa | 1
setosa | 1
setosa | 1
setosa | 1
setosa | 1
setosa | 1
setosa | 1
setosa | 0.99
.
.
.
(90 rows)