-
Notifications
You must be signed in to change notification settings - Fork 110
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
More work on downloadtweets
#122
base: downloadtweets
Are you sure you want to change the base?
More work on downloadtweets
#122
Conversation
…. See TODO comments in this commit.
…e-downloads, but they still happen.
…ining why it's not used
This is mainly archieved by adding `create_path_for_file_` methods to `PathConfig` and using them, and using `rel_url` to link to media files.
…roducible, deterministic filenames)
…ation (for group dms)
…in markdown output
…up before parsing
…ect messages) together
downloadtweets
(Probably broken PR diff)
Bugfix: check if user is in the correct folder before init of the other paths
Replace image URLs in DMs with links to local files
# Conflicts: # parser.py
More consistent / robust questions for consent
Put generated output in its own folder(s). Should fix timhutton#99.
…g images if there are none, show number of images otherwise.
…ations for downloading tweets and larger media files.
…fer to retry after failure. Fix path for known_tweets (which seemed to work despite wrong code?!)
downloadtweets
(Probably broken PR diff)downloadtweets
The online diff view is no longer broken 🎉 |
…hing about the overall purpose. Also improve error handling.
…Ms (for even more human-readable filenames)
Remove listing of moved media files, it's much to verbose.
Improve filename building for Group DMs
Bugfix: make sure there are no empty handles in UserData
Hi, any chance all this nice stuff gets integrated? Thanks! |
@slorquet I think this is still work in progress so can't be merged yet? @lenaschimmel are you still working on this? Can you rebase it, so it's easier to review? I'd like to get as much out of the sinking blue ship as possible :-) |
@Sjors I'm not really working on that branch, or twitter-archive-parser, any more since mid December 22. I've since deleted my own twitter Until then, I focused on the weirdly-named branch
|
@lenaschimmel Would you like to document the fact that the other branch is more worthy of being merged by closing/drafting this one, and opening a new PR? This little step could also make it easier for third-parties, like me, to discover your work and rely on it instead, in case @timhutton would not be available for further deliberation. |
Here's some work which I already did 3 days ago, but had not made into a PR before.
Yesterday I incorporated the newest stuff from
timhutton/twitter-archive-parser/main
into bothtimhutton/twitter-archive-parser/downloadtweets
andlenaschimmel/twitter-archive-parser/downloadtweets
because I thought that would make it easier to keep everything up to date and focus on the actual (non-merge) commits.What's actually included
This is just more WIP on the download tweets feature:
get_tweets
- if it fails, return the tweets we already downloaded, plus the ids of tweets that are still missingmerge
method, which can basically merge all kinds of python values, but contains some special treatments for dicts representing tweets. This should make sure that if a tweet is contained in the archive, and we download that same tweet via the API, we can merge them without losing any information.has_path
which simplifies those chained checks that we have all over the code, like:if 'entities' in tweet and 'user_mentions' in tweet['entities'] and tweet['entities']['user_mentions'] is not None
known_tweets.json
, and load them on next script execution, so that we don't reload the same tweets over and over again.from_api
,download_with_user
,download_with_alt_text
so thatcollect_tweet_references
can make better decisions on what do re-download, and whether to follow references or not