MsWordDecoder - Support tables #902
akozhikkot
started this conversation in
2. Feature requests
Replies: 1 comment 1 reply
-
hi @akozhikkot what you're describing goes under the area of "semantic chunking" that hasn't been developed yet. Currently text extraction is lossy, it extracts text without metadata like "this is a title", "this is a table" etc. We'd welcome anyone willing to work on semantic chunking, it's a pretty big feature that would require some planning ahead and discussions about the approach. |
Beta Was this translation helpful? Give feedback.
1 reply
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Currently if the word documents contain tables, when the document is read as part of the ingestion, all paragraphs are read, and that makes the values in the table columns to be exported as individual paragraphs.
Is there any changes to this is possible in which if we encounter a table may be format that differently ?
Beta Was this translation helpful? Give feedback.
All reactions