Skip to content

Commit

Permalink
Readme correction
Browse files Browse the repository at this point in the history
  • Loading branch information
NIXBLACK11 committed Feb 10, 2024
1 parent 9bac590 commit 279eb35
Show file tree
Hide file tree
Showing 3 changed files with 48 additions and 44 deletions.
47 changes: 3 additions & 44 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -43,51 +43,10 @@ optimizing operational processes.
By addressing these challenges, chatbots significantly contribute to improved customer
experiences, increased efficiency, and enhanced productivity across industries and sectors.

## Methodology
### Natural Language Processing (NLP) Pipeline
![CHATBANKER](./Readme/full.jpg)
## METHODOLOGY
[v1 methodology](./v1/README.md)
[v2 methodology](./v2/README.md)

The heart of this project lies in its NLP pipeline, which orchestrates a sequence of carefully designed steps to transform user input into meaningful responses. The pipeline leverages various NLP techniques and tools to ensure efficient comprehension and interaction.

#### Tokenization
![CHATBANKER](./Readme/tokenization.jpg)

Tokenization: User input sentences are initially tokenized, meaning they are divided into individual words or tokens. This process is facilitated by the NLTK library's nltk.word_tokenize() function. Tokenization dissects sentences into meaningful units, forming the basis for further analysis.

#### Stemming
![CHATBANKER](./Readme/stemming.jpg)

The pipeline employs stemming, using the Porter Stemmer algorithm (nltk.stem.porter.PorterStemmer), which identifies the root form of words. This process helps reduce variations of words to their base form, aiding in text normalization and pattern recognition.

#### Bag of Words Representation
![CHATBANKER](./Readme/bagofwords.jpg)

The tokenized words from user input are then transformed into a numerical representation known as a "bag of words."
Each sentence is converted into a fixed-length numerical vector. This vector encodes the presence or absence of words, disregarding their order. The nltk_utils.bag_of_words() function computes this vector by marking the presence of words from the tokenized sentence in a predefined vocabulary.

### Neural Network Architecture
![CHATBANKER](./Readme/ffnn.png)

The foundation of this project is a specialized Feedforward Neural Network (FNN) architecture, meticulously crafted to excel in multi-class classification tasks. Built using the PyTorch framework, the network employs layers designed for powerful feature extraction and intent classification.

#### Layers and Components
Input Layer: Bag of Words:
The neural network commences with an input layer that processes bag of words vectors. Each vector encodes the presence of specific words from user input. The length of the vector corresponds to the vocabulary size derived from training data.
Hidden Layers: Feature Extraction

Two hidden layers, comprising linear transformations (nn.Linear) followed by Rectified Linear Unit (ReLU) activation functions (nn.ReLU), perform intricate feature extraction.
These hidden layers unravel complex relationships in the input data, harnessing non-linearity for enhanced understanding.
Output Layer: Intent Classification

The output layer's size mirrors the count of distinct intents (classes) present in the training data. Each neuron in this layer signifies a potential intent.
Logits generated by this layer gauge the network's intent confidence based on input data.

#### Loss Function: Cross-Entropy Loss (nn.CrossEntropyLoss)

The Cross-Entropy Loss serves as the guiding compass for model training, gauging the dissimilarity between predicted intent probabilities and actual intent labels in the training data.
#### Optimizer: Adam Optimizer (torch.optim.Adam)

The project harnesses the power of the Adam optimizer, known for its adaptive learning rate mechanism. By adjusting learning rates based on gradient moments, this optimizer ensures efficient and stable convergence.
## Technologies used by CHATBANKER
When developing ChatBanker, I leveraged a variety of technologies to ensure its functionality
and effectiveness. Some of the key technologies utilized include:
Expand Down
45 changes: 45 additions & 0 deletions v1/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,45 @@
## Methodology
### Natural Language Processing (NLP) Pipeline
![CHATBANKER](../Readme/full.jpg)

The heart of this project lies in its NLP pipeline, which orchestrates a sequence of carefully designed steps to transform user input into meaningful responses. The pipeline leverages various NLP techniques and tools to ensure efficient comprehension and interaction.

#### Tokenization
![CHATBANKER](../Readme/tokenization.jpg)

Tokenization: User input sentences are initially tokenized, meaning they are divided into individual words or tokens. This process is facilitated by the NLTK library's nltk.word_tokenize() function. Tokenization dissects sentences into meaningful units, forming the basis for further analysis.

#### Stemming
![CHATBANKER](../Readme/stemming.jpg)

The pipeline employs stemming, using the Porter Stemmer algorithm (nltk.stem.porter.PorterStemmer), which identifies the root form of words. This process helps reduce variations of words to their base form, aiding in text normalization and pattern recognition.

#### Bag of Words Representation
![CHATBANKER](../Readme/bagofwords.jpg)

The tokenized words from user input are then transformed into a numerical representation known as a "bag of words."
Each sentence is converted into a fixed-length numerical vector. This vector encodes the presence or absence of words, disregarding their order. The nltk_utils.bag_of_words() function computes this vector by marking the presence of words from the tokenized sentence in a predefined vocabulary.

### Neural Network Architecture
![CHATBANKER](../Readme/ffnn.png)

The foundation of this project is a specialized Feedforward Neural Network (FNN) architecture, meticulously crafted to excel in multi-class classification tasks. Built using the PyTorch framework, the network employs layers designed for powerful feature extraction and intent classification.

#### Layers and Components
Input Layer: Bag of Words:
The neural network commences with an input layer that processes bag of words vectors. Each vector encodes the presence of specific words from user input. The length of the vector corresponds to the vocabulary size derived from training data.
Hidden Layers: Feature Extraction

Two hidden layers, comprising linear transformations (nn.Linear) followed by Rectified Linear Unit (ReLU) activation functions (nn.ReLU), perform intricate feature extraction.
These hidden layers unravel complex relationships in the input data, harnessing non-linearity for enhanced understanding.
Output Layer: Intent Classification

The output layer's size mirrors the count of distinct intents (classes) present in the training data. Each neuron in this layer signifies a potential intent.
Logits generated by this layer gauge the network's intent confidence based on input data.

#### Loss Function: Cross-Entropy Loss (nn.CrossEntropyLoss)

The Cross-Entropy Loss serves as the guiding compass for model training, gauging the dissimilarity between predicted intent probabilities and actual intent labels in the training data.
#### Optimizer: Adam Optimizer (torch.optim.Adam)

The project harnesses the power of the Adam optimizer, known for its adaptive learning rate mechanism. By adjusting learning rates based on gradient moments, this optimizer ensures efficient and stable convergence.
Empty file added v2/README.md
Empty file.

0 comments on commit 279eb35

Please sign in to comment.