-
Notifications
You must be signed in to change notification settings - Fork 8
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add Gmail takeout mbox import (v2) #8
base: master
Are you sure you want to change the base?
Commits on Feb 22, 2021
-
Configuration menu - View commit details
-
Copy full SHA for 8008357 - Browse repository at this point
Copy the full SHA 8008357View commit details -
Configuration menu - View commit details
-
Copy full SHA for 50e7e8d - Browse repository at this point
Copy the full SHA 50e7e8dView commit details
Commits on Feb 24, 2021
-
Configuration menu - View commit details
-
Copy full SHA for a3de045 - Browse repository at this point
Copy the full SHA a3de045View commit details
Commits on Jul 22, 2021
-
Parsing the mbox file manually instead of using Python's built-in parser allows us to process large files without loading them into memory all at once.
Configuration menu - View commit details
-
Copy full SHA for 72802a8 - Browse repository at this point
Copy the full SHA 72802a8View commit details -
Fix import for messages that don't have a Date
This fixes a regression introduced by the previous commit where messages no longer fetch the date from the mbox 'From ' line. For messages without a Date header this means we lose information about the delivery date.
Configuration menu - View commit details
-
Copy full SHA for 4bc7010 - Browse repository at this point
Copy the full SHA 4bc7010View commit details
Commits on Jul 28, 2021
-
Use thread id as pkey if missing message id.
Some messages (like gchat logs) don't have message ids and therefore don't save properly. This commit uses the gmail X-GM-THRID if the Message-Id is missing.
Configuration menu - View commit details
-
Copy full SHA for 8ee555c - Browse repository at this point
Copy the full SHA 8ee555cView commit details -
Fix parse exception: convert delivery_date to a str
The function email.utils.parsedate_tz expects a str, but we were passing bytes. Casting to str fixes an exception in messages where the Date header is missing and the delivery time must be inferred from the mbox header.
Configuration menu - View commit details
-
Copy full SHA for e1fdef7 - Browse repository at this point
Copy the full SHA e1fdef7View commit details
Commits on Aug 6, 2021
-
Use message id from mbox header if none exists in MIME.
Some messages (like chats) don't have a Message-Id mime header, so the message is saved without a primary key. A previous commit used the thread id in this situation, but the same thread id can be used for multiple messages. This id, which is the message id used by the gmail api, should be unique across all messages.
Configuration menu - View commit details
-
Copy full SHA for 8939f5b - Browse repository at this point
Copy the full SHA 8939f5bView commit details -
Explicitly parse email with compat32 policy.
The docs note: "The policy keyword should always be specified; The default will change to email.policy.default in a future version of Python."
Configuration menu - View commit details
-
Copy full SHA for 953e7eb - Browse repository at this point
Copy the full SHA 953e7ebView commit details -
Simplify handling of headers with binary data.
This shouldn't happen in RFC-abiding messages, but raw unicode or other non-ascii content will cause the header parser to return a Header object rather than a str. Improve handling of this case and add a simple unit test.
Configuration menu - View commit details
-
Copy full SHA for 50cc883 - Browse repository at this point
Copy the full SHA 50cc883View commit details -
Configuration menu - View commit details
-
Copy full SHA for 770bc0e - Browse repository at this point
Copy the full SHA 770bc0eView commit details -
Deal with invalid rfc 2047 strings.
If the string is invalid, the undecoded string is returned instead.
Configuration menu - View commit details
-
Copy full SHA for 4f50ff4 - Browse repository at this point
Copy the full SHA 4f50ff4View commit details
Commits on Aug 7, 2021
-
Configuration menu - View commit details
-
Copy full SHA for abb4dfd - Browse repository at this point
Copy the full SHA abb4dfdView commit details
Commits on Aug 8, 2021
-
Configuration menu - View commit details
-
Copy full SHA for 2a31dd4 - Browse repository at this point
Copy the full SHA 2a31dd4View commit details -
Configuration menu - View commit details
-
Copy full SHA for d3cf088 - Browse repository at this point
Copy the full SHA d3cf088View commit details -
Configuration menu - View commit details
-
Copy full SHA for 6a3832c - Browse repository at this point
Copy the full SHA 6a3832cView commit details -
Create table before inserting, ensuring proper column types.
In some instances tables would be created with the wrong column types if the initial records had unexpected types. This fixes the issue by explicitly creating the table and specifying types.
Configuration menu - View commit details
-
Copy full SHA for 25ee0a2 - Browse repository at this point
Copy the full SHA 25ee0a2View commit details -
Configuration menu - View commit details
-
Copy full SHA for 0bfe031 - Browse repository at this point
Copy the full SHA 0bfe031View commit details
Commits on Aug 10, 2021
-
Using this newer email parsing code enables parsing of attachments and easier parsing of html emails in the future.
Configuration menu - View commit details
-
Copy full SHA for 98d89bf - Browse repository at this point
Copy the full SHA 98d89bfView commit details -
This may be more robust than the tree-walking method we were using earlier, and will enable parsing of html email contents in a future commit.
Configuration menu - View commit details
-
Copy full SHA for c081ed3 - Browse repository at this point
Copy the full SHA c081ed3View commit details -
Parse html emails to plaintext.
(Only if no text/plain alternative exists)
Configuration menu - View commit details
-
Copy full SHA for 8e6d487 - Browse repository at this point
Copy the full SHA 8e6d487View commit details