Skip to content

Hawawou/NLP

Repository files navigation

NLP Project: Sentence Embeddings, Similarity, Clustering, and Title Generation

This project explores various Natural Language Processing (NLP) techniques to analyze and manipulate text data. It focuses on:

  • Sentence Embeddings: Generating meaningful vector representations of sentences.
  • Similarity Calculation: Using FAISS (Facebook AI Similarity Search) and cosine similarity to efficiently calculate similarity between sentence embeddings.
  • Clustering: Applying different clustering algorithms to group similar sentences together.
  • Title Generation: Leveraging a Large Language Model (LLM) to automatically generate titles for given text.

Key Features

  • Efficient Similarity Search: FAISS is used to perform fast and scalable similarity search on a large number of sentence embeddings.
  • Multiple Clustering Algorithms: The project explores various clustering algorithms (e.g., k-means, DBSCAN, hierarchical clustering) to identify different patterns in the data.
  • Automatic Title Generation: An LLM is employed to generate concise and relevant titles for input text, aiding in summarization and information retrieval.

Technologies Used

  • Python: The primary programming language for the project.
  • NLP Libraries: Libraries like Transformers, SentenceTransformers, or spaCy for sentence embedding generation.
  • FAISS: For efficient similarity search and clustering.
  • Clustering Libraries: scikit-learn for implementing clustering algorithms.
  • LLM API/Library: A library or API for accessing and utilizing a Large Language Model (e.g., OpenAI's GPT, Hugging Face Transformers).

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published