-
Notifications
You must be signed in to change notification settings - Fork 4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
10 TB -> 10 GB: Fix metadata w/ mk-zim-cat-item.py & mk-zim-cat.py [C'est pas sorcier, CrashCourse, TED-Ed, GCFAprendeLibre] #14
Conversation
FYI this PR is nothing more than an attempt at cleaning up the metadata for these three ~10GB ZIM files:
Long story short, this PR is nothing more than the automated results arising from running:
Concluding Questions:
|
ASIDE: How important is the following issue? |
Note that metadata associated with these 3 ZIM files is already problematic even prior to this PR — as seen in this kiwix-serve screenshot — if you look closely: So I'd assume this PR does not make the situation any worse? Hopefully @tim-moody can confirm? |
I don't see anything wrong with the revised version, except the date, which is probably technically correct as in when the zim was created, but not in sync with the content. The addition of the funny youtube tag and the first videos tag is not clear to me. Might have been passed as an argument at zim creation and kiwix mangled it. Hard for me to think that kiwix knows this is youtube material and added it. The important thing is that size is now correct and publisher is now correct, both used by the catalog display. Also, we need the pictures and details tags that are set by kiwix and were missing. Kiwix reports the path relative to the zims directory, so the ../library is common across all zims and isn't really used.
Well, it works for me, and I was the only one who used it. The problems identified in this ticket were produced by not using it. But it is hardly ready for mass consumption, so if there are an increasing number of zim creators who can't use it, someone could improve it. |
@georgejhunt can you recommend merging this PR experimentally — or another course of action if further metadata fixes are required? |
ASIDE: @deldesir asks if we can learn from @georgejhunt about any crucial-or-critical metadata hand-fixes that might be advisable in general? Specifically, do the 3 Basic Electricity ZIM files below illustrate any metadata tips & tricks we should learn from here?
|
ASIDE: @tim-moody you're probably aware but just FYI @deldesir did a quick survey of metadata across all 9 of our IIAB Catalog ZIM files:
|
I probably mistyped it. Can be fixed by hand. Not sure that we use favicon since we don't use kiwix as a front end |
looks right |
Isn't this logo/favicon automatically extracted when new menudefs are auto-created, to showcase new ZIM files on IIAB's main page? (Or so I thought, maybe I've got this all wrong, apologies if so!) |
not sure if we are still able to do this with the new catalog. in any event I always assumed that if we create a zim we would also create its menu def |
anyway, you're right someone should check if we are doing this with the new catalog and implement it if not. These logos look to be 48px x 48, which is a little small, but better than nothing. probably @deldesir could figure it out. |
@tim-moody after this PR (or similar) is merged, can users see the new ZIM catalog by logging back in to Admin Console, and/or by clicking "Reindex Kiwix Content", "Refresh Kiwix Catalog" or similar ? (Or is a fresh install of IIAB & Admin Console likely necessary?) |
just refresh catalog |
I tried contacting @georgejhunt directly for his recommendations here. No luck reaching him so far, since Wednesday. He should have good ideas. In any case, I suggest we move forward with this PR allowing for community testing — and amend it later wherever George and users can offer further improvements. |
I agree |
Looks great. |
would verify that source url not broken |
First fresh install doesn't seem to work: @tim-moody is it unrelated that jobs are marked "SCHEDULED" but never begin? I rebooted and that does not help. Waiting 10min also did not help. FYI this is Debian 12 (currently in pre-release freeze, and generally reliable in other regards). Feel free to login to 10.8.0.38 if that helps understand why it's stuck? |
can't login |
http://10.8.0.46/admin (Ubuntu 22.04) appears to have the very same problem. Can you log into either? (Or if not, any idea why?) |
Your ssh key should be supported on both VM's (mine works). Any idea what's happening? |
mine doesn't on either |
Hopefully when you're logged in as iiab-admin you can diagnose (why your ssh key is suddenly not working?) |
Just FYI iiab-cmdrv.service appears to be running fine on both VM's:
|
adm cons doesn't think kiwix is installed, so is waiting for it if you install from a preset kiwix will be installed, but otherwise you have to check it in configure and ico |
Any idea why this is happening? FYI /library/zims/content/teded_en_top_2021-01.zim continues to expand (presumably as a result of the wget command keeping the file handle, despite the destination having moved!) Is this A-Ok ?? |
what I said was with regard to .46. I see that .38 now has kiwix installed and the zim downloads have started and in one case succeeded. |
The only thing I can think of is that the restart logic doesn't handle job dependencies properly. |
Also though jobs 3 and 4 say succeeded, the zim was not added, and the job output even says that. On checking I see that rare diseases did get added, but not to menu |
I noticed that. Weird that neither ZIM file appears on the IIAB home page so far. (Certainly I can manually force these later, i.e. after the 10GB TED-Ed is downloaded, I can run Hopefully these kinds of things are a very rare occurrence arising from |
That's what I'm hoping. Just looking at the restart code to see if anything jumps out. Is it possible that wget continued even when you restart cmdsrv? i.e. there was no reboot |
The only thing I did was I did not reboot. (Should I have rebooted instead?) |
restart assumes a cold start after a shutdown or crash |
teded download is marked STARTED in the db, seems like we had a job that was still running at the OS level, but not at the cmdsrv level. don't know what that would do. |
Just FYI neither ZIM file appeared on the IIAB main page http://10.8.0.38 in the end. And just 1 of 2 appears at http://10.8.0.38/kiwix/ in the end. (I can manually force the fixing of all the small issues above, as mentioned earlier, so I'm only mentioning this as an FYI.) |
Time to repeat the experiment on 10.8.0.46 == Ubuntu 22.04 just for kicks? (Do you want to do that...or should I?) |
reboot .46 first as I have cmdsrv running manually |
I don't know how to check the DB for individual jobs' status (do you want to do this, then restart the VM to see how it goes?) |
there might be a logical flaw in cmdsrv 3634ff if the job is the highest number, has no dependencies, but is itself dependent. |
Is 3634ff a particular commit? Am I looking in the wrong repo if so (am not seeing it!) |
sorry is a line number and I don't know how you do the fancy L# url |
These lines below? (Click ellipsis on left, if you want either kind of URL...) |
https://github.com/iiab/iiab-admin-console/blob/master/roles/cmdsrv/files/iiab-cmdsrv3.py#L3566 is the function yeah was just doing it |
since this was after reboot, cmdsrv may not know about it unless you restart it and refresh admin |
I rebooted .46 (Ubuntu 22.04). The behavior is identical to .38 (Debian 12). (In each case, |
FYI, I removed all ZIM files via admin console (Install content > Manage content > Remove selected content) and redownload them using Admin console again. I monitored the commands jobs and all succeed. I can access all the ZIM via Kiwix. No service restart was needed. |
@deldesir perfect. thanks for confirming. |
Above issue is confirmed fixed! Thanks to @tim-moody's: |
@tim-moody please review before merging, to confirm this is sufficiently correct?