-
Notifications
You must be signed in to change notification settings - Fork 54
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
BREAKING CHANGE: remove legacy detectron2
model; remove layoutparser extras
#350
Conversation
detectron2
model; remove layoutparser extras
portalocker==2.8.2 | ||
# via iopath | ||
protobuf==5.26.1 | ||
# via | ||
# onnx | ||
# onnxruntime | ||
pycocotools==2.0.7 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here's the dependency we needed to remove
@@ -1,10 +1,12 @@ | |||
-c constraints.in | |||
layoutparser[layoutmodels,tesseract] | |||
layoutparser |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are we not able to lose layoutparser
completely?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We still have this TextBox
element we rely on from layoutparser
, but I wonder if we can just copy that over
from layoutparser.elements.layout import TextBlock |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That class has dependencies from a couple different modules in layoutparser
, doesn't look as simpler as copying over a small class. Opened #351 for follow on work to see if we can get rid of TextBlock
.
@qued - Tests are passing now, should be good for final review! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
Summary
First step in resolving Unstructured-IO/unstructured#3051. Per this comment, we were having troubling running
unstructured
in the Python 3.12wolfi-base
contain due to issues related topycocotools
, which is only used for the legacydetectron2
model fromlayoutparser
. Since we've replaced this withdetectron2onnx
, this PR removes thelayoutparser
extra dependencies that caused issues with Python 3.12.The
layoutparser
base dependency is still required because we use layout objects from that library. It's likely we could remove these in a future iteration.Temporarily disabled the ingest tests, because they seem to have been broken for the past six months. Last commit that they passed for was this one. Opened #352 to reenable them.
Testing
If CI passes we should be good to go.