Skip to content

Commit

Permalink
Merge branch 'main' into add-data-availability
Browse files Browse the repository at this point in the history
  • Loading branch information
lfoppiano committed Jun 22, 2024
2 parents ac59802 + b5dfde0 commit daaa33c
Show file tree
Hide file tree
Showing 3 changed files with 4 additions and 3 deletions.
2 changes: 1 addition & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -22,7 +22,7 @@ https://lfoppiano-document-qa.hf.space/

Question/Answering on scientific documents using LLMs: ChatGPT-3.5-turbo, GPT4, GPT4-Turbo, Mistral-7b-instruct and Zephyr-7b-beta.
The streamlit application demonstrates the implementation of a RAG (Retrieval Augmented Generation) on scientific documents, that we are developing at NIMS (National Institute for Materials Science), in Tsukuba, Japan.
Different to most of the projects, we focus on scientific articles.
**Different to most of the projects**, we focus on scientific articles and we extract text from a structured document.
We target only the full-text using [Grobid](https://github.com/kermitt2/grobid) which provides cleaner results than the raw PDF2Text converter (which is comparable with most of other solutions).

Additionally, this frontend provides the visualisation of named entities on LLM responses to extract <span stype="color:yellow">physical quantities, measurements</span> (with [grobid-quantities](https://github.com/kermitt2/grobid-quantities)) and <span stype="color:blue">materials</span> mentions (with [grobid-superconductors](https://github.com/lfoppiano/grobid-superconductors)).
Expand Down
2 changes: 1 addition & 1 deletion requirements.txt
Original file line number Diff line number Diff line change
Expand Up @@ -24,4 +24,4 @@ typing-inspect==0.9.0
typing_extensions==4.8.0
pydantic==2.4.2
sentence_transformers==2.2.2
streamlit-pdf-viewer
streamlit-pdf-viewer==0.0.13
3 changes: 2 additions & 1 deletion streamlit_app.py
Original file line number Diff line number Diff line change
Expand Up @@ -486,5 +486,6 @@ def generate_color_gradient(num_elements):
height=800,
annotation_outline_size=1,
annotations=st.session_state['annotations'],
rendering='unwrap' if st.session_state['pdf_rendering'] == 'PDF.JS' else 'legacy_embed'
rendering='unwrap' if st.session_state['pdf_rendering'] == 'PDF.JS' else 'legacy_embed',
render_text=True
)

0 comments on commit daaa33c

Please sign in to comment.