Understanding Naive Bayes
Naïve Bayes Classifier is machine learning model used to classify the object based on different features. The object or attribute that we are going to classify is also referred as dependent variable whereas the features that are used to predict the dependent variable is knows as independent variable (predictors).
Naïve Bayes classifier is a probabilistic model based on Bayes Theorem which states that:
Which means, calculating the probability of event y given that event X has already occurred. Here two assumptions are made to use it in Naïve Bayes.
The first assumption made here is that the predictors are independent from each other i.e. that one feature does not affect the presence of any other feature. That’s why it is called Naïve.
The second assumption made here is that the equal weight should be given to the predictors to make the prediction.
Let’s consider the below example to understand the implementation of theorem.
Here, we are trying to predict if the day is good for playing the golf or not given the features of the day. The row represents each day entry and columns represents the predictor features (except ‘PLAY GOLF’). Considering the first row, the day is not suitable for playing golf if the outlook is rainy, humidity is high, temperature is hot and it’s not windy. The first assumption for the Naïve Bayes is that the predictors are independent i.e. OUTLOOK is not dependent on TEMPRATURE, HUMIDITY, WINDY. Likewise, for other features. The second assumption is that the equal importance should be given to all the predictors.
From the above-mentioned Bayes Theorem, X can be considered as feature space and can be rewritten as below.
Where x1, x2, x3 …. xn are all the predictors (Here, OUTLOOK, TEMPRATURE, HUMIDITY, WINDY)
On substituting X into the original Bayes equation can be written as below-
Now all the values from the dataset can be substituted in the above equation. The denominator will remain constant for the given data, so it can be further simplified as below.
In our case, the class y has two outcomes and we need to select the class with the maximum probability, which can be written as:
This can be used to predict the class with the given features.
This algorithm is based on predictor independent assumption which hard to find the features in real world. It works well with multiclass prediction by providing the probability for each class. Due to these key points, this algorithm is mostly used in text classification / Spam filtering / Sentiment Analysis and have shown better results than other algorithms .
In Oracle, we can create Naïve Bayes Machine Learning model in Oracle Analytics Cloud as well as in ADW. OAC provides the drag and drop functionality and is quick to use for business users with less programming knowledge. Whereas ADW works directly on the database and is mostly used by IT team who has good knowledge of PL/SQL. In Oracle DB, Data Mining functions are defined to use different algorithms. To learn more on the usability of OAC and ADW, please refer to our video’s series.
- G. Chauhan, “All About Naive Bayes,” 2018. [Online]. Available: https://towardsdatascience.com/all-about-naive-bayes-8e13cef044cf.
- “GeeksforGeeks,” [Online]. Available: https://www.geeksforgeeks.org/naive-bayes-classifiers/.
Data Analytics Analyst