-
Notifications
You must be signed in to change notification settings - Fork 1.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: PyPDFToDocument
- add new customization parameters
#8574
Conversation
…te-custom-converter
…/deepset-ai/haystack into pypdf-deprecate-custom-converter
Pull Request Test Coverage Report for Build 12033758738Details
💛 - Coveralls |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems fine only minor nitpicking comments.
Maintaining compatibility on this one is going to be hard with so many fields. Any other way to deal with these in less breaking manner? Perhaps some config dictionary or something....
This was my original idea but then we discarded it to keep more control on accepted parameters and serialization. |
Related Issues
PyPDFToDocument
: make conversion customization easier for users #8553Proposed Changes:
PyPDFToDocument
to customize the text extraction process from PDF files.converter
is provided (it will be deprecated in chore:PyPDFToDocument
- deprecateconverter
init parameter #8569)How did you test it?
CI, new test
Notes for the reviewer
I don't particularly like the addition of all of these new init parameters, but we have already discussed and discarded the idea of having a single
extraction_kwargs
dict.Checklist
fix:
,feat:
,build:
,chore:
,ci:
,docs:
,style:
,refactor:
,perf:
,test:
.