You can use the Naive Bayes algorithm to classify your data when features can be assumed independent. The algorithm uses independent features to calculate the probability of a specific class. For example, you might want to predict the probability that an email is spam. In that case, you would use a corpus of words associated with spam to calculate the probability the email's content is spam.
You can use the following functions to build a Naive Bayes model, view the model, and use the model to make predictions on a set of test data:
This Naive Bayes example uses the HouseVotes84 data set to show you how to build a model.
This Naive Bayes example uses the HouseVotes84 data set to show you how to build a model. With this model, you can predict which party the member of the United States Congress is affiliated based on their voting record. To aid in classifying the data it has been cleaned, and any missed votes have been replaced. The cleaned data replaces missed votes with the voter's party majority vote. For example, suppose a member of the Democrats had a missing value for vote1 and majority of the Democrats voted in favor. This example replaces all missing Democrats' votes for vote1 with a vote in favor.
In this example, approximately 75% of the cleaned HouseVotes84 data is randomly selected and copied to a training table. The remaining cleaned HouseVotes84 data is used as a testing table.
Create a new table, named predicted_party_naive. Populate this table with the prediction outputs you obtain from the PREDICT_NAIVE_BAYES function on your test data.
=> CREATE TABLE predicted_party_naive
AS SELECT party,
PREDICT_NAIVE_BAYES (vote1, vote2, vote3, vote4, vote5,
vote6, vote7, vote8, vote9, vote10,
vote11, vote12, vote13, vote14,
vote15, vote16
USING PARAMETERS model_name = 'naive_house84_model',
type = 'response') AS Predicted_Party
FROM house84_test;
CREATE TABLE
Calculate the accuracy of the model's predictions.
=> SELECT (Predictions.Num_Correct_Predictions / Count.Total_Count) AS Percent_Accuracy
FROM ( SELECT COUNT(Predicted_Party) AS Num_Correct_Predictions
FROM predicted_party_naive
WHERE party = Predicted_Party
) AS Predictions,
( SELECT COUNT(party) AS Total_Count
FROM predicted_party_naive
) AS Count;
Percent_Accuracy
----------------------
0.933333333333333333
(1 row)
The model correctly predicted the party of the members of Congress based on their voting patterns with 93% accuracy.
Viewing the probability of each class
You can also view the probability of each class. Use PREDICT_NAIVE_BAYES_CLASSES to see the probability of each class.