feat: outlook ".msg" file converter #196
Open
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This pull request introduces a new feature to the application: the ability to convert Outlook .msg files into markdown format by extracting email metadata and content. The implementation adds a new class, OutlookMsgConverter, which extends the DocumentConverter base class.
Key Features
Email Metadata Extraction:
Extracts and includes key headers like From, To, Subject in the markdown output.
Email Body Conversion:
Reads the email body content and formats it into markdown.
Robust Encoding Support:
Attempts to decode content in UTF-16 first, falling back to UTF-8 and handling edge cases to ensure accurate conversion.
Implementation Details
File Validation:
Checks the file extension (.msg) before proceeding with the conversion.
Stream Parsing:
Uses olefile to parse the .msg file structure, extracting streams for headers and body content.
Error Handling:
Includes comprehensive exception handling to manage invalid files or unexpected errors during the conversion process.
Please review the implementation and provide feedback or suggestions for improvement.