From 279eb352b0be563cecc6731412704d679591ea5e Mon Sep 17 00:00:00 2001 From: NIXBLACK11 Date: Sat, 10 Feb 2024 16:05:01 +0530 Subject: [PATCH] Readme correction --- README.md | 47 +++-------------------------------------------- v1/README.md | 45 +++++++++++++++++++++++++++++++++++++++++++++ v2/README.md | 0 3 files changed, 48 insertions(+), 44 deletions(-) create mode 100644 v1/README.md create mode 100644 v2/README.md diff --git a/README.md b/README.md index 4edff1f..588d911 100644 --- a/README.md +++ b/README.md @@ -43,51 +43,10 @@ optimizing operational processes. By addressing these challenges, chatbots significantly contribute to improved customer experiences, increased efficiency, and enhanced productivity across industries and sectors. -## Methodology -### Natural Language Processing (NLP) Pipeline -![CHATBANKER](./Readme/full.jpg) +## METHODOLOGY +[v1 methodology](./v1/README.md) +[v2 methodology](./v2/README.md) -The heart of this project lies in its NLP pipeline, which orchestrates a sequence of carefully designed steps to transform user input into meaningful responses. The pipeline leverages various NLP techniques and tools to ensure efficient comprehension and interaction. - -#### Tokenization -![CHATBANKER](./Readme/tokenization.jpg) - -Tokenization: User input sentences are initially tokenized, meaning they are divided into individual words or tokens. This process is facilitated by the NLTK library's nltk.word_tokenize() function. Tokenization dissects sentences into meaningful units, forming the basis for further analysis. - -#### Stemming -![CHATBANKER](./Readme/stemming.jpg) - -The pipeline employs stemming, using the Porter Stemmer algorithm (nltk.stem.porter.PorterStemmer), which identifies the root form of words. This process helps reduce variations of words to their base form, aiding in text normalization and pattern recognition. - -#### Bag of Words Representation -![CHATBANKER](./Readme/bagofwords.jpg) - -The tokenized words from user input are then transformed into a numerical representation known as a "bag of words." -Each sentence is converted into a fixed-length numerical vector. This vector encodes the presence or absence of words, disregarding their order. The nltk_utils.bag_of_words() function computes this vector by marking the presence of words from the tokenized sentence in a predefined vocabulary. - -### Neural Network Architecture -![CHATBANKER](./Readme/ffnn.png) - -The foundation of this project is a specialized Feedforward Neural Network (FNN) architecture, meticulously crafted to excel in multi-class classification tasks. Built using the PyTorch framework, the network employs layers designed for powerful feature extraction and intent classification. - -#### Layers and Components -Input Layer: Bag of Words: -The neural network commences with an input layer that processes bag of words vectors. Each vector encodes the presence of specific words from user input. The length of the vector corresponds to the vocabulary size derived from training data. -Hidden Layers: Feature Extraction - -Two hidden layers, comprising linear transformations (nn.Linear) followed by Rectified Linear Unit (ReLU) activation functions (nn.ReLU), perform intricate feature extraction. -These hidden layers unravel complex relationships in the input data, harnessing non-linearity for enhanced understanding. -Output Layer: Intent Classification - -The output layer's size mirrors the count of distinct intents (classes) present in the training data. Each neuron in this layer signifies a potential intent. -Logits generated by this layer gauge the network's intent confidence based on input data. - -#### Loss Function: Cross-Entropy Loss (nn.CrossEntropyLoss) - -The Cross-Entropy Loss serves as the guiding compass for model training, gauging the dissimilarity between predicted intent probabilities and actual intent labels in the training data. -#### Optimizer: Adam Optimizer (torch.optim.Adam) - -The project harnesses the power of the Adam optimizer, known for its adaptive learning rate mechanism. By adjusting learning rates based on gradient moments, this optimizer ensures efficient and stable convergence. ## Technologies used by CHATBANKER When developing ChatBanker, I leveraged a variety of technologies to ensure its functionality and effectiveness. Some of the key technologies utilized include: diff --git a/v1/README.md b/v1/README.md new file mode 100644 index 0000000..8f753d7 --- /dev/null +++ b/v1/README.md @@ -0,0 +1,45 @@ +## Methodology +### Natural Language Processing (NLP) Pipeline +![CHATBANKER](../Readme/full.jpg) + +The heart of this project lies in its NLP pipeline, which orchestrates a sequence of carefully designed steps to transform user input into meaningful responses. The pipeline leverages various NLP techniques and tools to ensure efficient comprehension and interaction. + +#### Tokenization +![CHATBANKER](../Readme/tokenization.jpg) + +Tokenization: User input sentences are initially tokenized, meaning they are divided into individual words or tokens. This process is facilitated by the NLTK library's nltk.word_tokenize() function. Tokenization dissects sentences into meaningful units, forming the basis for further analysis. + +#### Stemming +![CHATBANKER](../Readme/stemming.jpg) + +The pipeline employs stemming, using the Porter Stemmer algorithm (nltk.stem.porter.PorterStemmer), which identifies the root form of words. This process helps reduce variations of words to their base form, aiding in text normalization and pattern recognition. + +#### Bag of Words Representation +![CHATBANKER](../Readme/bagofwords.jpg) + +The tokenized words from user input are then transformed into a numerical representation known as a "bag of words." +Each sentence is converted into a fixed-length numerical vector. This vector encodes the presence or absence of words, disregarding their order. The nltk_utils.bag_of_words() function computes this vector by marking the presence of words from the tokenized sentence in a predefined vocabulary. + +### Neural Network Architecture +![CHATBANKER](../Readme/ffnn.png) + +The foundation of this project is a specialized Feedforward Neural Network (FNN) architecture, meticulously crafted to excel in multi-class classification tasks. Built using the PyTorch framework, the network employs layers designed for powerful feature extraction and intent classification. + +#### Layers and Components +Input Layer: Bag of Words: +The neural network commences with an input layer that processes bag of words vectors. Each vector encodes the presence of specific words from user input. The length of the vector corresponds to the vocabulary size derived from training data. +Hidden Layers: Feature Extraction + +Two hidden layers, comprising linear transformations (nn.Linear) followed by Rectified Linear Unit (ReLU) activation functions (nn.ReLU), perform intricate feature extraction. +These hidden layers unravel complex relationships in the input data, harnessing non-linearity for enhanced understanding. +Output Layer: Intent Classification + +The output layer's size mirrors the count of distinct intents (classes) present in the training data. Each neuron in this layer signifies a potential intent. +Logits generated by this layer gauge the network's intent confidence based on input data. + +#### Loss Function: Cross-Entropy Loss (nn.CrossEntropyLoss) + +The Cross-Entropy Loss serves as the guiding compass for model training, gauging the dissimilarity between predicted intent probabilities and actual intent labels in the training data. +#### Optimizer: Adam Optimizer (torch.optim.Adam) + +The project harnesses the power of the Adam optimizer, known for its adaptive learning rate mechanism. By adjusting learning rates based on gradient moments, this optimizer ensures efficient and stable convergence. \ No newline at end of file diff --git a/v2/README.md b/v2/README.md new file mode 100644 index 0000000..e69de29