-
Notifications
You must be signed in to change notification settings - Fork 10
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
opam.ocaml.org does not get updated (since ~2 days) #48
Comments
The failure here is due to a build error, not due to an infrastructure error. If anything, we need a bit more infrastructure to alert us when the opam.ocaml.org pushes fail (a Matrix channel would be ideal). See deploy.ci.ocaml.org, and after clicking on the logs tab and looking through the table, I spotted https://deploy.ci.ocaml.org/job/2023-06-06/160459-ocluster-build-dcd1d2, which in turn shows that opam2web is failing:
I've not root caused this further yet... /cc @mtelvers @tmcgilchrist |
It looks like the deployer is building all branches of opam2web for some reason. That seems like it could be tightened down to just the live and staging branches. Regarding this question by @hannesm:
... the tracking issue is #31 |
That is happening by design, to check any PRs are deployable before merging to live or staging. We could remove that behaviour but I would advise against it. There are many stale branches on opam2web could they be cleaned up to just what is required?
Tracking issue is ocurrent/ocurrent-deployer#111. Is there a Matrix channel / server available for posting messages to? Current plan was to post to the Slack channel for opam-maintainers. Usually the issue is the large size of the docker image created and the build timing out or getting rate limited by docker hub. The longer term fix for the opam2web size issue is to move the documentation into the new ocaml.org website and have opam2web just build the index file. |
Thanks @tmcgilchrist, the deployability checks do indeed make sense. I think the real blocker to debugging what's going on is the lack of historical build information, which I've posted up at ocurrent/ocurrent-deployer#190. Without that, there's not much point having the web interface for the deployer as it's always only showing the current (and long-running) build. ocurrent/ocurrent-deployer#190 I've set up a simple Matrix room on #ocaml-infra:recoil.org which we can use for notifications. Once it's working, we can alias to another homeserver (for redundancy) and then add it to the OCaml space. |
How about removing the arm64 build, as we only deploy the x86_64 version? Both builds happen in parallel therefore, it wouldn't be any quicker, but both builds must succeed to proceed to the next stage of the pipeline so there would be one fewer dependency. Should save a bit of carbon too! |
First of all, thanks for fixing the update process (in case you did something, at least there was an update of the opam repository on opam.ocaml.org). Second, I'll close this issue. I have the feeling that you're very convinced that the system and complexity is very necessary for any commit to the opam-repository, and shoveling huge docker images across the Internet for deployment is deeply necessary. Whereas my approach would be radically different: I'd try to find the minimal thing which needs to be done for an update (including building opam2web binaries and package them, with the grand goal to save resources (computation / network)). But since you're convinced of the technology and stack in use, I won't argue against it. |
@hannesm, removing the Docker Hub from the equation to save resources is entirely in scope, especially if it saves resources and energy (which it will). It's a matter of smooth transition of the infrastructure and time, and Ocurrent can easily wrap any dataflow. I'd welcome a simpler future infrastructure than the existing one. |
Again, it is 2 days behind. While scrolling through "https://deploy.ci.ocaml.org/?repo=ocaml-opam/opam2web&", I can find two "jobs" (please excuse if you have other terminology) -- one being "ocurrent/opam.ocaml.org: live", the other "ocurrent/opam.ocaml.org: staging". Somehow, one gets "live", the other "live-staging" branch of Now, in their "log output", there's a lot of stuff, but I'm curious that both logs have this line: For me, as somehow who doesn't know anything about docker and docker hub, it looks like they're racing pushing to the same tag remotely. Is this correct? I haven't had any luck to figure out what these "jobs" are actually supposed to do (apart from the graphical output which lacks all the details). May it be, that, given the current pace of development of opam2web, restrict these two "jobs" to a single one? I also have a hard time to understand where / what is getting deployed if both push the same tag, and the host in mind is only "opam.ocaml.org" -- is there a "live" and "staging" subdomain? Is it worth it? Is it possible for you to hand out an executable POSIX shell script that condenses the steps taken when "there is a new commit to opam-repository"? I'd love to take a look what is involved to get a clearer mind about the carbon footprint involved. With "ocurrent" and some docker scripting, I'm sure you can extract that. If not, a (single!) Dockerfile could be helpful as well. Thanks for reading. |
@hannesm the build instructions are documented on https://github.com/ocaml-opam/opam2web#docker. What ocurrent is doing in this process is running that docker build with the lastest git version for opam repository and ocaml/platform-blog, and then deploying that. If you want to run it locally use this command: DOCKER_BUILDKIT=1 \
docker build -t opam2web -f Dockerfile . --build-arg \
OPAM_GIT_SHA=42b392e634b2f2fc7e027070ccae412e55eba41b \
BLOG_GIT_SHA=356e7d2ea63d5945828b9c5421a007db125f1710 The build generates a large docker image with all the package documentation, which is what takes so much time to build and triggers the timeouts you are seeing. The plan is to move everything to ocaml.org documentation, and we can stop building that and just generate the opam index file which will be much faster. That work is being done under #26 cc @tmattio In the meantime the docker layers present in that Dockerfile could be optimised to avoid rebuilding and using cached layers. If you have some time and want to help with that, it would be appreciated. Finally I've restarted the build and will keep an eye on it today. |
Thanks for your pointer. Unfortunately, there's no docker available on my operating system. I'm still confused by the Dockerfile you pointed to (so many So, good luck with that. From your message
do you mean the package index, as in https://opam.ocaml.org/packages/awa/, or is there other (API) documentation being built? Certainly I understand that the platform-blog and the opam documentation is put there. Btw, do you have an idea why in the log output of both deployer jobs the following lines occur (as I mentioned above) - and do both live and staging race for the same tag (do these contain the same data?)?
|
Yeah, what can I say it isn't optimal and was only supposed to be in place for a short time while a better solution was being developed.
Yes that is right, so it builds all of https://opam.ocaml.org/packages/* for all packages plus the platform-blog and opam documentation, as per your response. This will be resolved by #26 which shouldn't be far away.
They will be using different tags so there is no race, but most of the data will be the same. This isn't worth fixing since this whole setup will be replaced soon. Briefly, the deployment is:
The extra docker pushes you're pointing to are a staging docker hub hosted locally on the machine for caching. Before pointing out the obvious waste in pushing images around, the services deployed using |
Thanks for your instructions. Still, I don't have any "docker" executable on my Unix operating system, so I'm out of luck trying to do anything in this regard. I still don't understand the setup and why it is so complex (and which bits are pushed around for what). In any case, it seems like your solution "wait until ocaml.org hosts the package stuff" is what you're aiming for. I don't have anything to contribute there. For what it is worth, there's still a huge delay from "someone merged a PR" to "it shows up on opam.ocaml.org" (> 20 hours). But the accumulated technical debt in your deployment seems to be superseeded (soon, or at least in a planned future) by some other piece of technology (which by luck may result in quicker updates, though there may be ocaml.org package index and opam.ocaml.org/index.tgz being out of sync -- but maybe that is not relevant for those maintaining "ocaml.org"). |
Dear Madam or Sir,
first of all thanks for running opam.ocaml.org as a community service. :)
I noticed from opam update that the opam.ocaml.org hosts are not getting updates since Sunday June 4th 19:04:41 2023 +0100 (commit 9681b042 according to the repo file of the opam.ocaml.org hosts).
I'm curious how to move here, is the infrastructure and its setup/deployment maybe a bit too involved (in terms of complexity, requiring GitHub, some machines to produce artifacts (docker images), Docker Hub for download and upload, and some other machines to execute things), esp. with the recent issues in this area: IPv6 outage, and failure to update some of the machines that serve the repository (missing ssh key).
Another question is whether you have monitoring of the service opam.ocaml.org (about the key things: online, replies to HTTP requests, serves an up-to-date archive), and if yes, is that online and available somewhere? (I suggest setting up a "status.opam.ocaml.org" with some information, and maybe post-mortens about the issues that happened in recent months.)
I hope this is the right repository to report this issue to, in case you've any questions or want to discuss this topic further, don't hesitate to reach out to me.
The text was updated successfully, but these errors were encountered: