This project explores various Natural Language Processing (NLP) techniques to analyze and manipulate text data. It focuses on:
- Sentence Embeddings: Generating meaningful vector representations of sentences.
- Similarity Calculation: Using FAISS (Facebook AI Similarity Search) and cosine similarity to efficiently calculate similarity between sentence embeddings.
- Clustering: Applying different clustering algorithms to group similar sentences together.
- Title Generation: Leveraging a Large Language Model (LLM) to automatically generate titles for given text.
- Efficient Similarity Search: FAISS is used to perform fast and scalable similarity search on a large number of sentence embeddings.
- Multiple Clustering Algorithms: The project explores various clustering algorithms (e.g., k-means, DBSCAN, hierarchical clustering) to identify different patterns in the data.
- Automatic Title Generation: An LLM is employed to generate concise and relevant titles for input text, aiding in summarization and information retrieval.
- Python: The primary programming language for the project.
- NLP Libraries: Libraries like Transformers, SentenceTransformers, or spaCy for sentence embedding generation.
- FAISS: For efficient similarity search and clustering.
- Clustering Libraries: scikit-learn for implementing clustering algorithms.
- LLM API/Library: A library or API for accessing and utilizing a Large Language Model (e.g., OpenAI's GPT, Hugging Face Transformers).