title	comments	mathjax
Idea and Implementation / Multinomial Naive Bayes	true	true

Multinomial naive bayes is the naive Bayes algorithm for multinomially distributed data. For a brief and intuitive explanation of Bayes theorem, read this kernel of mine: Gaussian Naive Bayes Classifier from Scratch. Everything is similar to Gaussian NB except the $$P(x_i \mid y)$$. The new equation is, $$ P(x_i \mid y) = \frac{N_{yi} + \alpha}{N_y + \alpha n} \label{eq1}\tag{1} $$ Here,

$$\alpha$$ is the smoothing parameter,
$$N_{yi}$$ is the count of feature $$x_i$$ in class y.
$$N_y$$ is the total count of all features in class y
$$n$$ is the total number of features

Multinomial Naive Bayes

You can look up in detail about multinomial distribution and you should. I will only put a short description of how a multinomial naive bayes classifier considers data.

Multinomial Data

$$X_1$$	$$X_2$$	$$X_3$$
1	0	4
4	2	3

In the table above containing 2 sample of 3 features, we observe that feature $$X_1$$ has values 1 and 4, and so on. That is the common view of the data. And when other a general model accepts this data, it considers each number as value. For example, $$X_{1,2}=3$$. But in case of reading a multinomial data, $$X_{1,2}$$ says how many of feature $$X_{2}$$ is in sample 1. Meaning $$X_{1,2}$$ is not value of the feature, instead it is the count of the feature. Let's consider a text corpus. Each sentence is made up of different words $$w_i$$ and each of those $$w_i$$ belongs to the vocabulary, $$V$$. If $$V$$ contains 8 words, $$w_1,w_2,...,w_8$$ and if a sentence is: w1 w2 w2 w6 w3 w2 w8, the representation of that sentence will be-

$$w_1$$	$$w_2$$	$$w_3$$	$$w_4$$	$$w_5$$	$$w_6$$	$$w_7$$	$$w_8$$
1	3	1	0	0	1	0	1

After inserting some other random sentences, the dataset is-

$$w_1$$	$$w_2$$	$$w_3$$	$$w_5$$	$$w_6$$	$$w_7$$	$$w_8$$
1	3	1	0	1	0	1
1	0	0	1	1	1	3
0	0	0	0	2	1	2

By the way, I haven't put them in a class. Randomly taking, $$y$$ = [1,0,1]. Now, comparing with the equation of above,

$$N_{yi}$$ is the count of feature $$w_i$$ in each unique class of y. For example, for $$y=1$$, $$N_{y,1}=1, N_{y,6}=3$$
$$N_y$$ is the total count of all features in each unique class of y. For example, for $$y=1$$, $$N_y=12$$
$$n=8$$ is the total number of features
$$\alpha$$ is known as smoothing parameter. It is needed for zero probability problem which is explained in resource [1]

To calculate likelyhoods for a test sentence, all we need is $$P(w_i \mid y)$$ which will be used to calculate $$P(X \mid y)$$ from training data. But $$P(w_i \mid y)$$ is the probability of feature $$w_i$$ appearing under class y once. If our test sentence has any feature $$w_i$$ n times, we will need to include $$P(w_i \mid y)$$ in $$P(X \mid y)$$ n times too. So, final equation for $$P(X_i \mid y)$$ will be-

$$ P(X_i \mid y) = P(w_1 \mid y)^{X_{i,1}} \times P(w_2 \mid y)^{X_{i,2}} \times ... \times P(w_n \mid y)^{X_{i,n}} $$

Resources:

https://www.inf.ed.ac.uk/teaching/courses/inf2b/learnnotes/inf2b-learn07-notes-nup.pdf
https://scikit-learn.org/stable/modules/naive_bayes.html#multinomial-naive-bayes

Some libraries and test data

import numpy as np
import pandas as pd
from sklearn.naive_bayes import MultinomialNB
from sklearn.metrics import accuracy_score
from sklearn.model_selection import train_test_split

# test data
tmpX1 = np.array([int(i.strip()) for i in "2   0   0   0   1   2   3   1  0   0   1   0   2   1   0   0  0   1   0   1   0   2   1   0  1   0   0   2   0   1   0   1  2   0   0   0   1   0   1   3  0   0   1   2   0   0   2   1".split("  ")])
tmpX2 = np.array([int(i.strip()) for i in "0   1   1   0   0   0   1   0  1   2   0   1   0   0   1   1  0   1   1   0   0   2   0   0  0   0   0   0   0   0   0   0  0   0   1   0   1   0   1   0".split("  ")])
X = np.concatenate((tmpX1.reshape(-1,8), tmpX2.reshape(-1,8)), axis=0)
y = np.array([0,0,0,0,0,0,1,1,1,1,1])
X_test = np.array([[2,1,0,0,1,2,0,1],[0,1,1,0,1,0,1,0]])
y_test = np.array([0,1])
print("X and Y shapes\n", X.shape, y.shape)

X and Y shapes
 (11, 8) (11,)

Class MultiNB

class MultiNB:
    def __init__(self,alpha=1):
        self.alpha = alpha
    
    def _prior(self): # CHECKED
        """
        Calculates prior for each unique class in y. P(y)
        """
        P = np.zeros((self.n_classes_))
        _, self.dist = np.unique(self.y,return_counts=True)
        for i in range(self.classes_.shape[0]):
            P[i] = self.dist[i] / self.n_samples
        return P
            
    def fit(self, X, y): # CHECKED, matches with sklearn
        """
        Calculates the following things- 
            class_priors_ is list of priors for each y.
            N_yi: 2D array. Contains for each class in y, the number of time each feature i appears under y.
            N_y: 1D array. Contains for each class in y, the number of all features appear under y.
            
        params
        ------
        X: 2D array. shape(n_samples, n_features)
            Multinomial data
        y: 1D array. shape(n_samples,). Labels must be encoded to integers.
        """
        self.y = y
        self.n_samples, self.n_features = X.shape
        self.classes_ = np.unique(y)
        self.n_classes_ = self.classes_.shape[0]
        self.class_priors_ = self._prior()
        
        # distinct values in each features
        self.uniques = []
        for i in range(self.n_features):
            tmp = np.unique(X[:,i])
            self.uniques.append( tmp )
            
        self.N_yi = np.zeros((self.n_classes_, self.n_features)) # feature count
        self.N_y = np.zeros((self.n_classes_)) # total count 
        for i in self.classes_: # x axis
            indices = np.argwhere(self.y==i).flatten()
            columnwise_sum = []
            for j in range(self.n_features): # y axis
                columnwise_sum.append(np.sum(X[indices,j]))
                
            self.N_yi[i] = columnwise_sum # 2d
            self.N_y[i] = np.sum(columnwise_sum) # 1d
            
    def _theta(self, x_i, i, h):
        """
        Calculates theta_yi. aka P(xi | y) using eqn(1) in the notebook.
        
        params
        ------
        x_i: int. 
            feature x_i
            
        i: int.
            feature index. 
            
        h: int or string.
            a class in y
        
        returns
        -------
        theta_yi: P(xi | y)
        """
        
        Nyi = self.N_yi[h,i]
        Ny  = self.N_y[h]
        
        numerator = Nyi + self.alpha
        denominator = Ny + (self.alpha * self.n_features)
        
        return  (numerator / denominator)**x_i
    
    def _likelyhood(self, x, h):
        """
        Calculates P(E|H) = P(E1|H) * P(E2|H) .. * P(En|H).
        
        params
        ------
        x: array. shape(n_features,)
            a row of data.
        h: int. 
            a class in y
        """
        tmp = []
        for i in range(x.shape[0]):
            tmp.append(self._theta(x[i], i,h))
        
        return np.prod(tmp)
    
    def predict(self, X):
        samples, features = X.shape
        self.predict_proba = np.zeros((samples,self.n_classes_))
        
        for i in range(X.shape[0]):
            joint_likelyhood = np.zeros((self.n_classes_))
            
            for h in range(self.n_classes_):
                joint_likelyhood[h]  = self.class_priors_[h] * self._likelyhood(X[i],h) # P(y) P(X|y) 
                
            denominator = np.sum(joint_likelyhood)
            
            for h in range(self.n_classes_):
                numerator = joint_likelyhood[h]
                self.predict_proba[i,h] = (numerator / denominator)
            
        indices = np.argmax(self.predict_proba,axis=1)
        return self.classes_[indices]

def pipeline(X,y,X_test, y_test, alpha):
    """
    Sklearn Sanity Check
    """
    print("-"*20,'Sklearn',"-"*20)
    clf = MultinomialNB(alpha=alpha)
    clf.fit(X,y)
    sk_y = clf.predict(X_test)
    print("Feature Count \n",clf.feature_count_)
    print("Class Log Prior ",clf.class_log_prior_)
    print('Accuracy ',accuracy_score(y_test, sk_y),sk_y)
    print(clf.predict_proba(X_test))
    print("-"*20,'Custom',"-"*20)
    nb = MultiNB(alpha=alpha)
    nb.fit(X,y)
    yhat = nb.predict(X_test)
    me_score = accuracy_score(y_test, yhat)
    print("Feature Count\n",nb.N_yi)
    print("Class Log Prior ",np.log(nb.class_priors_))
    print('Accuracy ',me_score,yhat)
    print(nb.predict_proba) # my predict proba is only for last test set

pipeline(X,y,X,y, alpha=1)

-------------------- Sklearn --------------------
Feature Count 
 [[5. 1. 2. 5. 4. 6. 7. 6.]
 [1. 4. 3. 1. 1. 2. 3. 1.]]
Class Log Prior  [-0.6061358  -0.78845736]
Accuracy  0.8181818181818182 [0 0 0 0 0 0 1 1 1 0 0]
[[0.74940942 0.25059058]
 [0.52879735 0.47120265]
 [0.53711475 0.46288525]
 [0.69613326 0.30386674]
 [0.75239818 0.24760182]
 [0.62207341 0.37792659]
 [0.39213534 0.60786466]
 [0.45705923 0.54294077]
 [0.42055705 0.57944295]
 [0.54545455 0.45454545]
 [0.51099295 0.48900705]]
-------------------- Custom --------------------
Feature Count
 [[5. 1. 2. 5. 4. 6. 7. 6.]
 [1. 4. 3. 1. 1. 2. 3. 1.]]
Class Log Prior  [-0.6061358  -0.78845736]
Accuracy  0.8181818181818182 [0 0 0 0 0 0 1 1 1 0 0]
[[0.74940942 0.25059058]
 [0.52879735 0.47120265]
 [0.53711475 0.46288525]
 [0.69613326 0.30386674]
 [0.75239818 0.24760182]
 [0.62207341 0.37792659]
 [0.39213534 0.60786466]
 [0.45705923 0.54294077]
 [0.42055705 0.57944295]
 [0.54545455 0.45454545]
 [0.51099295 0.48900705]]

Spam Classification

from nltk.corpus import stopwords
import string
from sklearn.feature_extraction.text import CountVectorizer
from sklearn.preprocessing import LabelBinarizer

df = pd.read_csv("../data/spam_uci.csv",encoding='iso8859_14')
df.drop(labels=df.columns[2:],axis=1,inplace=True)
df.columns=['target','text']

Simple Preprocessing

to cleanup punctuations and stopwords

def clean_util(text):
    punc_rmv = [char for char in text if char not in string.punctuation]
    punc_rmv = "".join(punc_rmv)
    stopword_rmv = [w.strip().lower() for w in punc_rmv.split() if w.strip().lower() not in stopwords.words('english')]
    
    return " ".join(stopword_rmv)

df['text'] = df['text'].apply(clean_util)

Vectorizing

Conforming the texts to the multinomial format we have discussed in the beginning. Also, classes in y must be converted to integers as I forgot to account for strings in my implementation and too lazy to update •͡˘㇁•͡˘

cv = CountVectorizer()
X = cv.fit_transform(df['text']).toarray()
lb = LabelBinarizer()
y = lb.fit_transform(df['target']).ravel()
print(X.shape,y.shape)

(5572, 9381) (5572,)

# Train Test Split
X_train, X_test, y_train, y_test = train_test_split(X,y)
print(X_train.shape, X_test.shape, y_train.shape, y_test.shape)

(4179, 9381) (1393, 9381) (4179,) (1393,)

sklearn's MultinomialNB

sk = MultinomialNB().fit(X_train,y_train)
sk.score(X_test,y_test)

0.9755922469490309

our MultiNB (⌐■_■)

%%time
me = MultiNB()
me.fit(X_train, y_train)
yhat = me.predict(X_test)
print(accuracy_score(y_test,yhat))

0.9755922469490309
CPU times: user 1min 5s, sys: 0 ns, total: 1min 5s
Wall time: 1min 5s

It takes a lot of time but does not matter as it is a reference implementation only ヽ(｀Д´)ﾉ

I wrote the scratch implementation for my learning, if you see any error or typo, please let me know.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

2020-07-05-multi-naive-bayes.md

2020-07-05-multi-naive-bayes.md

Multinomial Naive Bayes

Multinomial Data

Some libraries and test data

Class MultiNB

Spam Classification

Simple Preprocessing

Vectorizing

Files

2020-07-05-multi-naive-bayes.md

Latest commit

History

2020-07-05-multi-naive-bayes.md

File metadata and controls

Multinomial Naive Bayes

Multinomial Data

Some libraries and test data

Class MultiNB

Spam Classification

Simple Preprocessing

Vectorizing