You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I'm looking at the input prompt to the LLM of the tutorial about document joiner, and the output prompt has lots of words with no whitespaces between them. Is the output of the LLM affected? Do you know how I can solve this issue? I tried using the ollama embedder, and the output had the same problem.
The documents used are the ones downloaded in the tutorial: one .txt, one .pdf and one .md
The text was updated successfully, but these errors were encountered:
This PDF is somewhat strange, so currently the only way to properly extract the text is via a custom Converter.
frompypdfimportPdfReaderfromhaystackimportDocument, default_to_dict, default_from_dictclassCustomConverter:
defconvert(self, reader: "PdfReader") ->Document:
"""Extract text from the PDF and return a Document object with the text content."""text="\f".join(page.extract_text(extraction_mode="layout") forpageinreader.pages)
returnDocument(content=text)
defto_dict(self):
"""Serialize the converter to a dictionary."""returndefault_to_dict(self)
@classmethoddeffrom_dict(cls, data):
"""Deserialize the converter from a dictionary."""returndefault_from_dict(cls, data)
pdf_converter=PyPDFToDocument(converter=CustomConverter())
I'm looking at the input prompt to the LLM of the tutorial about document joiner, and the output prompt has lots of words with no whitespaces between them. Is the output of the LLM affected? Do you know how I can solve this issue? I tried using the ollama embedder, and the output had the same problem.
The documents used are the ones downloaded in the tutorial: one .txt, one .pdf and one .md
The text was updated successfully, but these errors were encountered: