Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cache subtitles on S3 as well #277

Open
benoit74 opened this issue Jul 23, 2024 · 3 comments · May be fixed by #287
Open

Cache subtitles on S3 as well #277

benoit74 opened this issue Jul 23, 2024 · 3 comments · May be fixed by #287
Assignees
Milestone

Comments

@benoit74
Copy link
Collaborator

Currently only video thumbnails and video themselves are cached on S3.

This has the drawback that when an IP has been blacklisted from yt-dlp usage, the recipe fails to produce the ZIM even if all API calls have succeeded, because we use yt-dlp to download the subtitles.

Caching the subtitles on S3 would allow to create the ZIM.

@kelson42
Copy link
Contributor

Seems a good idea but do subtitles are served properly using etags?

@dan-niles
Copy link
Collaborator

Currently we're using yt-dlp to download subtitles and etags are not provided for subtitles. The response is in the following format:

"requested_subtitles": {
  "en": {
	  "ext": "vtt",
	  "url": "https://www.youtube.com/api/timedtext?v=DYvYGQHYScc&ei=rzKqZouKCqfWz7sPiu_E2Qw&caps=asr&opi=112496729&xoaf=5&hl=en&ip=0.0.0.0&ipbits=0&expire=1722455327&sparams=ip%2Cipbits%2Cexpire%2Cv%2Cei%2Ccaps%2Copi%2Cxoaf&signature=D55586A99B8028F2565AFE1F76F3F55D8BE2ECA6.E032AF517474302C806EE8A02C6CDC914CD903B9&key=yt8&lang=en&fmt=vtt",
	  "name": "English"
  }
},

However the YouTube Data API (https://developers.google.com/youtube/v3/docs/captions#resource-representation) does provide etags for captions.

@dan-niles
Copy link
Collaborator

@benoit74 and I discussed the possibility of hashing the url of each subtitle provided by yt-dlp and using it as an etag. However, it seems that this URL changes every time it is fetched by yt-dlp.

I tried manually editing the subtitles of this video on the openZIM_testing YouTube channel to observe how the URL is affected. However, it appears that YouTube fetches the latest subtitles internally, and the query parameters in the URL don't seem to have an impact.

@dan-niles dan-niles self-assigned this Jul 31, 2024
@dan-niles dan-niles linked a pull request Aug 4, 2024 that will close this issue
@benoit74 benoit74 modified the milestones: 3.1.0, 3.2.0 Sep 5, 2024
@benoit74 benoit74 modified the milestones: 3.2.0, 3.3.0 Oct 11, 2024
@benoit74 benoit74 modified the milestones: 3.3.0, 3.4.0 Nov 4, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants