Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Recategorize messages #977

Open
wants to merge 11 commits into
base: master
Choose a base branch
from
Open

Recategorize messages #977

wants to merge 11 commits into from

Conversation

squeaky-pl
Copy link
Contributor

@squeaky-pl squeaky-pl commented Nov 13, 2024

This will basically find all the messages in the database where "categories" on the email message don't match the ones from imapuids. Once we run this the right categories will get propagated to the CRM, and this - in turn - will create inbox notifications.

@squeaky-pl squeaky-pl force-pushed the recategorize-messages branch 2 times, most recently from 556d9b8 to ebd959d Compare November 13, 2024 16:20
Comment on lines +128 to +154
update_message_metadata(
session, message.account, message, message.is_draft
)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the function that was changed in a broken a way on the PRs that we reverted. So if we capture categories before, run the function after the reverts, and then read categories - and then are different we found a miscategorized email.

Comment on lines +106 to +131
def session_factory():
return global_session_scope() if dry_run else session_scope(None)
Copy link
Contributor Author

@squeaky-pl squeaky-pl Nov 13, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is sync-engine curiosity. sync-engine has two ways of managing transactions... global_session_scope is for reads and session_scope is mainly meant for writes. session_scope will automatically commit at the end. And it also automatically behind the scenes serializes changes to transaction log, and that transaction log is what CRM consumes. session_scope in original sync-engine implementation also allowed to shard to different databases connected to the same cluster, but we never used that feature so passing None will just route to the only database we have.

The naming of those and the exact intentions of Nylas people were always a mystery to me.

only_inbox: bool,
only_types: set[AccountType] = ALL_ACCOUNT_TYPES,
) -> Iterable[int, list[int]]:
discriminators = {account_type + "account" for account_type in only_types}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This piece of code for obtaining the filtered namespace_query is almost the same as in get_total_namespace_count. Should we factor it out into a separate private function?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if date_start:
query = query.filter(Message.created_at >= date_start)
if date_end:
query = query.filter(Message.created_at < date_end)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we make it inclusive? Given a date range 2024-10-15 - 2024-10-16, I would imagine it's a 2 day interval.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I understand your thinking. I tend to make it exclusive on one side because this way if you run two times [1, 2), [2, 3) you don't reprocess the intersection twice, but in this case we have a single interval.

3fc5c0a (#977)

@squeaky-pl
Copy link
Contributor Author

squeaky-pl commented Nov 14, 2024

@neob91-close I also added this commit that lets me only propagate changes for emails that are missing "inbox" category as most escalations are about this. This will also speed things up and ensure we don't overload CRM with delta changes.

@squeaky-pl
Copy link
Contributor Author

@neob91-close I also changed --only-account-id to --only-account-ids in 5eebb94 (#977) because I will target the customers that reported the problem first cluster by cluster before running it massively.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants