-
Notifications
You must be signed in to change notification settings - Fork 4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add title retrieval to Notion crawler #765
Conversation
…tests so that map_metadata handles the title injection
This pull request has been linked to Shortcut Story #24068: Add title to NotionExtractor. |
….com/MetaphorData/connectors into rishimohan/sc-24068/add-title-notion
☂️ Python Coverage
Overall Coverage
New FilesNo new covered files... Modified Files
|
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## main #765 +/- ##
==========================================
+ Coverage 92.07% 92.29% +0.22%
==========================================
Files 194 154 -40
Lines 15899 15758 -141
==========================================
- Hits 14639 14544 -95
+ Misses 1260 1214 -46 ☔ View full report in Codecov by Sentry. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
🤔 Why?
SC24068
The NotionReader loader from LlamaHub doesn't include functionality to get a page's title, which is useful in generating embeddings and also when returning search results and generated results to users.
🤓 What?
Adds a method to the Notion extractor to retrieve a page's title from the Notion API. The extractor now stores the page title in the document metadata for use later.
Also modifies
map_metadata
to prepend the Title to all text nodes.Also includes a couple other fixes (setting _description, _platform, adjusting configuration information).
🧪 Tested?
Unit tests were updated accordingly and ran successfully. Also verified end-to-end with ingestion.
☑️ Checks
pyproject.toml
.