Skip to content

Commit

Permalink
Merge pull request #271 from DebanKsahu/main
Browse files Browse the repository at this point in the history
SkimLit
  • Loading branch information
UppuluriKalyani authored Oct 12, 2024
2 parents d7ef0ba + 3f514ac commit dffefdc
Show file tree
Hide file tree
Showing 9 changed files with 4,651 additions and 0 deletions.
9 changes: 9 additions & 0 deletions Natural Language Processing/SkimLit/Dataset/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
# To use the dataset in your local sever or in jupiter notebook follow through below code
## Cloning the Repository

To clone this repository, run the following command:

```bash
git clone https://github.com/Franck-Dernoncourt/pubmed-rct.git
ls pubmed-rct
```
4,382 changes: 4,382 additions & 0 deletions Natural Language Processing/SkimLit/NoteBook/SkimLit.ipynb

Large diffs are not rendered by default.

46 changes: 46 additions & 0 deletions Natural Language Processing/SkimLit/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,46 @@
# Guide Through the project SkimLit

## Overview

SkimLit is an implementation of the SkimLit model, designed to efficiently summarize text and provide insights into the main ideas. This project diverges from the original paper by incorporating several enhancements and modifications to improve performance and usability.I created 3 different model for this project and all models are avaliable in *SkimLit-Models* folder but for front-end I will go with the model with best accuracy.

As all models trained and tested on PubMed dataset so it will work best on summerizing the content related to medical research.The dataset is available on **Dataset** folder

## Table of Contents

- [Features](#features)
- [Installation](#installation)
- [Demo](#demo)
- [Results](#results)
- [Important_Notes](#important-notes)

## Features

- Summarizes long texts into concise, easy-to-read summaries.
- User-friendly interface for ease of use.

## Installation

To get started with SkimLit, clone the repository and install the required dependencies:

```bash
git clone https://github.com/yourusername/SkimLit.git
cd SkimLit
pip install -r requirements.txt
```

## Demo ##

[Demo_Video](./Result/Skimlit_Demo.mp4)

## Results

![Model_1_result](./Result/best_model_1_result.png)
![Model_2_result](./Result/best_model_2_result.png)
![Multi_model_result](./Result/multi_model_result.png)

## Important Notes
- There is some issues with using keras tunner with pretrained embedding layer(**Universal Sentence Encode**) but for tensorflow version 2.15.0 that model was trained without any issue so if anyone want to train model-2 then please degrade the tensorflow verson to 2.15.0
- In streamlit I used model-1 so I write the data preprocessing steps accordingly for other kind of model like multimodel and model with **charater-level embedding and vectorizer** you have to change the dataprocessing steps accordingly.
- As I can't add models using git LFS in repo so please run the notebook first at colab or local server and then again run the web.py file

Binary file not shown.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
74 changes: 74 additions & 0 deletions Natural Language Processing/SkimLit/Streamlit/web.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,74 @@
import streamlit as st
import tensorflow as tf
import numpy as np
from typing import List
import pathlib
curr_dir_path = pathlib.Path.cwd()
curr_dir_parent = curr_dir_path.parent
page_bg = """
<style>
.stApp {
background-image: url('https://images.pexels.com/photos/129731/pexels-photo-129731.jpeg?auto=compress&cs=tinysrgb&w=1260&h=750&dpr=1');
background-size: cover;
background-repeat: no-repeat;
background-attachment: fixed;
background-position: center;
}
textarea {
background-color: rgba(255, 255, 255, 0.8);
border-radius: 10px;
}
</style>
"""

# Render the background CSS
st.markdown(page_bg, unsafe_allow_html=True)

@st.cache_resource
def load_model(path):
model = tf.keras.models.load_model(path)
return model
my_model = load_model(str(curr_dir_parent)+"/SkimLit_Models/Model-1.keras")

classes = ["BACKGROUND","OBJECTIVE","METHODS","RESULTS","CONCLUSIONS"]

def data_preprocessing(data: List[str]):
dataset = tf.data.Dataset.from_tensor_slices((data))
dataset=dataset.batch(32).prefetch(tf.data.AUTOTUNE)
return dataset

def process_text(input_text):
abstract_line_split = input_text.split('.')
print(abstract_line_split[:-1],"HERE")
return abstract_line_split[:-1]

st.title("SkimLit Text Processor")


input_text = st.text_area("Input Paragraph:", height=150)

# Create a button to process the text
if st.button("Process Text"):
if input_text:
text_list = process_text(input_text)
data = data_preprocessing(text_list)
curr_list=[[""] for _ in range(len(classes))]
final_text=""
start_point=0
for batch in data:
print(f"Length of batch {len(batch)}")
prediction = my_model(batch)
predicted_classes= tf.argmax(prediction,axis=1).numpy()
for i in range(len(predicted_classes)):
curr_list[predicted_classes[i]].append(text_list[start_point+i])
start_point+=len(predicted_classes)
background = classes[0]+(".").join(curr_list[0])
objective = classes[1]+(".").join(curr_list[1])
methods = classes[2]+(".").join(curr_list[2])
results = classes[3]+(".").join(curr_list[3])
conclusion = classes[4]+(".").join(curr_list[4])
final_text=background+'\n'+objective+'\n'+methods+'\n'+results+'\n'+conclusion

st.text_area("Output Paragraph:", value=final_text, height=150)
else:
st.error("Please enter a paragraph to process.")
140 changes: 140 additions & 0 deletions Natural Language Processing/SkimLit/requirements.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,140 @@
absl-py==2.1.0
altair==5.4.1
argon==0.2.0
argon2-cffi==23.1.0
argon2-cffi-bindings==21.2.0
array_record==0.5.1
asttokens==2.4.1
astunparse==1.6.3
attrs==24.2.0
bleach==6.1.0
blinker==1.8.2
cachetools==5.5.0
caer==2.0.8
certifi==2024.8.30
cffi==1.17.1
charset-normalizer==3.3.2
click==8.1.7
comm==0.2.2
contourpy==1.3.0
cycler==0.12.1
debugpy==1.8.6
decorator==5.1.1
dm-tree==0.1.8
docstring_parser==0.16
etils==1.9.4
executing==2.1.0
flatbuffers==24.3.25
fonttools==4.54.1
fsspec==2024.9.0
gast==0.6.0
gitdb==4.0.11
GitPython==3.1.43
google-pasta==0.2.0
googleapis-common-protos==1.65.0
grpcio==1.66.1
h5py==3.11.0
idna==3.10
immutabledict==4.2.0
importlib_resources==6.4.5
ipykernel==6.29.5
ipython==8.27.0
jedi==0.19.1
Jinja2==3.1.4
jsonschema==4.23.0
jsonschema-specifications==2023.12.1
jupyter_client==8.6.3
jupyter_core==5.7.2
kaggle==1.6.17
kagglehub==0.3.1
keras==3.5.0
keras-core==0.1.7
keras-cv==0.9.0
keras-tuner==1.4.7
kiwisolver==1.4.7
kt-legacy==1.0.5
libclang==18.1.1
Markdown==3.7
markdown-it-py==3.0.0
MarkupSafe==2.1.5
matplotlib==3.9.2
matplotlib-inline==0.1.7
mdurl==0.1.2
ml-dtypes==0.4.1
mypy==1.11.2
mypy-extensions==1.0.0
namex==0.0.8
narwhals==1.8.3
nest-asyncio==1.6.0
numpy==1.26.4
nvidia-cublas-cu12==12.3.4.1
nvidia-cuda-cupti-cu12==12.3.101
nvidia-cuda-nvcc-cu12==12.3.107
nvidia-cuda-nvrtc-cu12==12.3.107
nvidia-cuda-runtime-cu12==12.3.101
nvidia-cudnn-cu12==8.9.7.29
nvidia-cufft-cu12==11.0.12.1
nvidia-curand-cu12==10.3.4.107
nvidia-cusolver-cu12==11.5.4.101
nvidia-cusparse-cu12==12.2.0.103
nvidia-nccl-cu12==2.19.3
nvidia-nvjitlink-cu12==12.3.101
opencv-contrib-python==4.10.0.84
opt-einsum==3.3.0
optree==0.12.1
packaging==24.1
pandas==2.2.3
parso==0.8.4
pexpect==4.9.0
pillow==10.4.0
platformdirs==4.3.6
promise==2.3
prompt_toolkit==3.0.48
protobuf==4.25.5
psutil==6.0.0
ptyprocess==0.7.0
pure_eval==0.2.3
pyarrow==17.0.0
pycparser==2.22
pydeck==0.9.1
Pygments==2.18.0
pyparsing==3.1.4
python-dateutil==2.9.0.post0
python-slugify==8.0.4
pytz==2024.2
pyzmq==26.2.0
referencing==0.35.1
regex==2024.9.11
requests==2.32.3
rich==13.8.1
rpds-py==0.20.0
setuptools==75.1.0
simple-parsing==0.1.6
six==1.16.0
smmap==5.0.1
stack-data==0.6.3
streamlit==1.38.0
tenacity==8.5.0
tensorboard==2.17.1
tensorboard-data-server==0.7.2
tensorflow==2.17.0
tensorflow-datasets==4.9.6
tensorflow-hub==0.16.1
tensorflow-metadata==1.16.0
termcolor==2.4.0
text-unidecode==1.3
tf_keras==2.17.0
toml==0.10.2
tornado==6.4.1
tqdm==4.66.5
traitlets==5.14.3
typing_extensions==4.12.2
tzdata==2024.2
urllib3==2.2.3
watchdog==4.0.2
wcwidth==0.2.13
webencodings==0.5.1
Werkzeug==3.0.4
wheel==0.44.0
wrapt==1.16.0
zipp==3.20.2

0 comments on commit dffefdc

Please sign in to comment.