Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Extended the pages.csv #26

Merged
merged 8 commits into from
Jul 13, 2024
Merged

Extended the pages.csv #26

merged 8 commits into from
Jul 13, 2024

Conversation

am9zZWY
Copy link
Owner

@am9zZWY am9zZWY commented Jul 12, 2024


Important:
Please merge first #18


Changes

  • This extends the pages.csv with title and description. For this a new pipeline stage was added called Indexer. It adds an entry in the pages_df.
  • The Tokenizer only adds the tokenized_text to the pages_df.
  • Add more structure to custom_db and more stability against errors
  • Fixes CSVs are overridden instead of being extended #25

@am9zZWY am9zZWY changed the title Extended the pages Extended the pages.csv Jul 12, 2024
@am9zZWY am9zZWY self-assigned this Jul 12, 2024
@am9zZWY am9zZWY added the ENGINE label Jul 12, 2024
@am9zZWY am9zZWY merged commit 8a3e4ee into master Jul 13, 2024
3 checks passed
@am9zZWY am9zZWY deleted the josef-extend-index branch July 14, 2024 18:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

CSVs are overridden instead of being extended
1 participant