NLP Project: Sentence Embeddings, Similarity, Clustering, and Title Generation

This project explores various Natural Language Processing (NLP) techniques to analyze and manipulate text data. It focuses on:

Sentence Embeddings: Generating meaningful vector representations of sentences.
Similarity Calculation: Using FAISS (Facebook AI Similarity Search) and cosine similarity to efficiently calculate similarity between sentence embeddings.
Clustering: Applying different clustering algorithms to group similar sentences together.
Title Generation: Leveraging a Large Language Model (LLM) to automatically generate titles for given text.

Key Features

Efficient Similarity Search: FAISS is used to perform fast and scalable similarity search on a large number of sentence embeddings.
Multiple Clustering Algorithms: The project explores various clustering algorithms (e.g., k-means, DBSCAN, hierarchical clustering) to identify different patterns in the data.
Automatic Title Generation: An LLM is employed to generate concise and relevant titles for input text, aiding in summarization and information retrieval.

Python: The primary programming language for the project.
NLP Libraries: Libraries like Transformers, SentenceTransformers, or spaCy for sentence embedding generation.
FAISS: For efficient similarity search and clustering.
Clustering Libraries: scikit-learn for implementing clustering algorithms.
LLM API/Library: A library or API for accessing and utilizing a Large Language Model (e.g., OpenAI's GPT, Hugging Face Transformers).

Name		Name	Last commit message	Last commit date
Latest commit History 15 Commits
wkp_sorted		wkp_sorted
.gitignore		.gitignore
README.md		README.md
bert.ipynb		bert.ipynb
clustering.ipynb		clustering.ipynb
data.csv		data.csv
data1.csv		data1.csv
data_with_embeddings.csv		data_with_embeddings.csv
embed.ipynb		embed.ipynb
embedding.ipynb		embedding.ipynb
llama_embed.ipynb		llama_embed.ipynb
sentence_embed.ipynb		sentence_embed.ipynb
word2vec.model		word2vec.model