Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dropping CentOS 6 & Moving to CentOS 7 #1436

Closed
jakirkham opened this issue May 7, 2021 · 65 comments
Closed

Dropping CentOS 6 & Moving to CentOS 7 #1436

jakirkham opened this issue May 7, 2021 · 65 comments

Comments

@jakirkham
Copy link
Member

Raising this issue to track and discuss when we want to drop CentOS 6 and move to CentOS 7 as the new default. This also came up in the core meeting this week

cc @conda-forge/core

@h-vetinari
Copy link
Member

This came up in a numpy issue uncovered by testing the rc's of 1.21.0 for conda-forge - in particular, a test fails due to a bug in glibc 2.12 (not present anymore in 2.17).

There would be a patch to work around the bug, but @rgommers asked:

CentOS 6 is >6 months past EOL. Official EOL really should be the last time to drop a constraint. The patch for working around this problem seems fine to accept, but exercises like these are a massive waste of time. For manylinux we're also fine to move to glibc 2.17 (= manylinux2014), for the same reason. What's the problem here exactly?

I brought this comment into the feedstock PR, where @xhochy noted:

I think we should in general move conda-forge to cos7 but here is probably not the right place to discuss this. Probably we already have an issue for that.

Hence moving it here.

@h-vetinari
Copy link
Member

Also xref #1432

@h-vetinari
Copy link
Member

Was this issue discussed further at recent core meetings (I've occasionally seen public hackmd notes, but no idea where to find the a collection of them)?

Any statistics or arguments that go against doing this?

Assuming it should be done, this probably needs a migrator (for adding sysroot_linux-64 2.17 everywhere?). I'd be willing to help, but would need some more pointers.

@jakirkham
Copy link
Member Author

It was. Nothing conclusive yet. We collect the meeting notes here

Informally we know there are still some CentOS 6 users (the long tail of support). That said, we do lack statistics either way. So this is something we discussed. Namely how best to collect them

Yeah I think we need to decide this is something we want to do first, which we haven’t done yet

@h-vetinari
Copy link
Member

h-vetinari commented Jul 4, 2021

I understand that some people are stuck on EOL'd OSes, but IMO the case to hold back based on that is really tenuous. If you're on an EOL OS, you eventually get no software updates anymore - why should conda-forge go out of its way to still service those users?

I have to agree with @rgommers' statement (I quoted) above - stuff like numpy/numpy#19192 has a real cost. It probably bound 10-20h of maintainer (resp. core contributor) time in total, and would have been completely avoided without an ancient glibc.

@h-vetinari
Copy link
Member

Another datapoint: I now have a staged-recipes PR that cannot build because the GPU-build only has glibc 2.12 (pytorch >=1.8 needs 2.17), and the CentOS7 build doesn't start: conda-forge/staged-recipes#16306

@isuruf
Copy link
Member

isuruf commented Oct 1, 2021

That's not a datapoint. We've documented this in our docs on how to use CentOS7.

@h-vetinari
Copy link
Member

That's not a datapoint. We've documented this in our docs on how to use CentOS7.

I know how to do it per-feedstock, but the above packaged cannot currently make it through staged recipes, or at least I'll need help to pull it off. Someone could also merge it and I fix things once the feedstock is created. But it's um... suboptimal... and definitely related to CentOS6, so I'd still call it a datapoint.

@isuruf
Copy link
Member

isuruf commented Oct 1, 2021

I know how to do it per-feedstock, but the above packaged cannot currently make it through staged recipes, or at least I'll need help to pull it off.

Have you tried doing the same in staged-recipes? It should work.

@chrisburr
Copy link
Member

It does work on staged-recipes, see here for an example (CentOS 6 fails as expected but the CentOS 7 based job passes and the feedstock is generated correctly thanks to the conda-forge.yml in the recipe directory.)

That said, I am noticing more and more places where CentOS 6 issues are appearing and moving a feedstock to CentOS 7 causes the downstream feedstocks to also need to be changed causing yet more manual intervention to be needed.

@h-vetinari
Copy link
Member

h-vetinari commented Nov 26, 2021

In the last few weeks, I've probably spent upwards of 15h chasing down bugs that ended up being resolved by moving to CentOS 7. This is a real cost. Same for less experienced contributors running into cryptic resolution errors for trying to package something that (now) needs a sysroot 2.17, and end up abandoning the recipes.

@jakirkham: Informally we know there are still some CentOS 6 users (the long tail of support).

Can we quantify this? CentOS 6 is EOL for a year now. Why are we so beholden to that long tail? Are those parties contributing to conda-forge somehow (infra costs or packaging effort)? If not, why are we providing free support longer than even RedHat? More to the point: why do we accept them externalizing their costs for not dealing with 10+ year old software to conda-forge?

That said, we do lack statistics either way. So this is something we discussed. Namely how best to collect them

If it takes X months to collect those statistics, that is a bad trade-off IMO.

@chrisburr
Copy link
Member

@conda-forge/core Does anyone have an objections to changing the default sysroot to CentOS 7? If not I'll make PRs to change it early next week.

@beckermr
Copy link
Member

I know of users this will impact.

What exactly is the problem with our current setup?

@chrisburr
Copy link
Member

I also know users who this will effect, including myself. I also know people using CentOS 5-like systems with conda, who will continue to do so for at least the next decade so we can't wait until nobody is using CentOS 6 anymore.

What exactly is the problem with our current setup?

  • The biggest thing is that downstream packages of CentOS 7-only ones have to modify their feedstock to use the newer docker container and the errors that happen in the CI are indecipherable to non-experts (I'm seeing this multiple times a week).
  • A lot of packages have dropped support for CentOS 6. It's not obvious to maintainers how to deal with the "xxx is not defined" errors.
  • I'm also aware of at least one issue with the compilers themselves that comes from using the CentOS 6 sysroot at build time (it decides not to add some #defines as they we're available when GCC itself was built).

Over the last 6 months hundreds of hours must have been spent dealing with these issues and I'm not convinced hundreds more should be spent over the next six months. For people really stuck on CentOS 6 we could add a global label (likegcc7 and cf202003) or they can go around forcing the old sysroot using the same mechanism as we currently use for upgrading to CentOS 7 if they really need to.

@beckermr
Copy link
Member

Global labels don't get repo data patching which at this point will render the channel likely wrong.

@h-vetinari
Copy link
Member

100% agree with what @chrisburr wrote. There are also some pretty gnarly bugs in the trigonometry functions of glibc < 2.17 that have bitten me at least 3 times already.

@beckermr: I know of users this will impact.

And they can keep using old packages, or use paid support for their ancient platforms. I empathise that there are some people between a rock and a hard place, but again:

why do we accept them externalizing their costs for not dealing with 10+ year old software to conda-forge?

Those 100s of hours Chris is mentioning might be "free" but they come at the cost of other things not being improved or fixed or packaged, and barring strong countervailing reasons, that's IMO a horrible trade-off to make against the ecosystem in favour of an unspecified handful of people who cannot manage to run less-than-decade-old software, yet need the newest packages.

@beckermr
Copy link
Member

Many folks stuck on an older centos are not there by choice. They are constrained by the lack of upgrades on big systems run by government labs, etc. The idea that they can simply pay for support is a non-starter to anyone who works in or understands how those organizations work.

I am bringing this up because the remedies for using cos6 that folks keep bringing up here are not really available to the people that need cos6.

We are making a choice to leave them behind when a majority of the software we build does not require cos6 at all.

I suspect a much better path would be to further improve our support for cos7 in smithy or our bots.

@leofang
Copy link
Member

leofang commented Nov 26, 2021

Many folks stuck on an older centos are not there by choice. They are constrained by the lack of upgrades on big systems run by government labs, etc.

If you are referring to DOE labs, last time I heard the BES office demanded a through upgrade from its facilities due to cybersecurity concerns (cc: @mrakitin) and I assume the similar mandates should also be posted by other offices.

@alippai
Copy link

alippai commented Nov 26, 2021

@beckermr the legacy software on the legacy systems will keep running even if conda forge starts building on CentOS7. CentOS 6 was released literally 10 years ago. Government labs running inefficient HW and SW stack is not something anyone should encourage or promote. That hurts the economy, research and the environment. Those systems cost everyone time and money (along with conda forge people and contributors). My understanding is that both build performance and the performance of the built libs is different on Conda 6 vs 7, isn't this true?

@beckermr
Copy link
Member

Thanks for the responses everyone!

I don't see anyone addressing directly the points I raised. The cost here is the time for folks who need cos7 and don't know it when they are building a package. They see an odd error and it costs them time to track down. I 100% agree that this cost is real.

Moving the default to cos7 is one way to reduce this cost. However it is not the only way. My premise is that given the headache this will cause for cos6 users in general, and that fact that cos7 is not required the majority of the time, we're better off improving the tooling around cos7 so that maintainers can better use it.

@chrisburr
Copy link
Member

Global labels don't get repo data patching which at this point will render the channel likely wrong.

Good point, I forgot about this. Hopefully the __glibc contraint can be good enough to allow people to keep using the channel. 🤞

I suspect a much better path would be to further improve our support for cos7 in smithy or our bots.

This is might an option but I'm not sure it's easy to do the "right" thing and it might not even be possible. How do you see this working? I have two ideas and I think I would lean towards option 1 for simplicity.

Option 1

The bot automatically migrates downstream feedstocks as soon an upstream feedstock moves to be CentOS 7-only.

Option 2

Try to be smarter and use solvability as a constraint i.e.

  • if Y depends on X
  • X=1.1 was built with CentOS 6
  • X=1.2 was built with CentOS 7
  • Which X should Y build with? Use run_exports to guide the process?

I'm not sure how stable it will be and I suspect there are a lot of unstable edgecases. In particular what happens if both CentOS 6 and CentOS 7 are unsolvable?

@isuruf
Copy link
Member

isuruf commented Nov 26, 2021

Option 3

Change the default docker image to be cos7 for all feedstocks, but keep the sysroot to be cos6. This would remove the solver errors.

@beckermr
Copy link
Member

beckermr commented Feb 8, 2023

We've been putting this off for a long time. I'd advocate we continue to do so and not switch until absolutely necessary. We should understand what the exact issue is here before we proceed.

@hmaarrfk
Copy link
Contributor

I would like to ask for guidance on what to do about clock_gettime. It seems that it is provided by GLIBC in 2.17. However, since we want to support COS6, we shouldn't really "update" to COS7.

Should we add the -lrt flags?
conda-forge/zstd-feedstock#67

@hmaarrfk
Copy link
Contributor

@beckermr zstd seems to be hitting the need to update to cos7 -- conda-forge/zstd-feedstock#71

While we could likely patch things away, seems like busy work

on my present environment, the following packages depend on zstd

$ mamba repoquery whoneeds zstd

 Name        Version   Build              Depends               Channel     
─────────────────────────────────────────────────────────────────────────────
 blosc       1.21.4    h0f2a231_0         zstd >=1.5.2,<1.6.0a0 conda-forge 
 boost-cpp   1.78.0    h6582d0a_3         zstd >=1.5.2,<1.6.0a0 conda-forge 
 c-blosc2    2.9.3     hb4ffafa_0         zstd >=1.5.2,<1.6.0a0 conda-forge 
 curl        8.1.2     h409715c_0         zstd >=1.5.2,<1.6.0a0 conda-forge 
 imagecodecs 2023.1.23 py39h9e8eca3_2     zstd >=1.5.2,<1.6.0a0 conda-forge 
 libcurl     8.1.2     h409715c_0         zstd >=1.5.2,<1.6.0a0 conda-forge 
 libllvm15   15.0.7    h5cf9203_2         zstd >=1.5.2,<1.6.0a0 conda-forge 
 libnetcdf   4.9.2     nompi_h0f3d0bb_105 zstd >=1.5.2,<1.6.0a0 conda-forge 
 libsystemd0 253       h8c4010b_1         zstd >=1.5.2,<1.6.0a0 conda-forge 
 libtiff     4.5.1     h8b53f26_0         zstd >=1.5.2,<1.6.0a0 conda-forge 
 llvm-openmp 16.0.6    h4dfa4b3_0         zstd >=1.5.2,<1.6.0a0 conda-forge 
 mysql-libs  8.0.33    hca2cd23_0         zstd >=1.5.2,<1.6.0a0 conda-forge

notably, llvm seems like it would get bumped to cos7...

Do we feel like it is finally time?

@beckermr
Copy link
Member

This may be the end indeed. Let's talk it over at the next dev meeting.

@isuruf
Copy link
Member

isuruf commented Jun 27, 2023

I'm all for bumping to cos7, but the zstd issue seems to be an update where the existing workaround at https://github.com/regro-cf-autotick-bot/zstd-feedstock/blob/1.5.5_hd39c66/recipe/install.sh#L7-L10 doesn't seem to work anymore. It's easy to patch by adding a target_link_libraries(target -lrt) in the cmake file.

@h-vetinari
Copy link
Member

h-vetinari commented Jun 27, 2023

I'm in favour of bumping (and never thought I'd provide arguments to the contrary 😋), but two more reasons make this less of a make-or-break situation:

  • llvm's dependence on zstd is optional, we could conceivably turn that off (on linux64)
  • llvm is far less pervasive in our packages on linux than it is on osx

I agree that we shouldn't try to patch around libraries to try to keep them compatible1, that's IMO a sisyphean task with lots of risk and little reward. But linking an additional library should still be manageable.

So I think that day is coming, I don't mind if it does, but it doesn't have to be this zstd-issue that breaks the camel's back.

PS. As I noted in the other issue, one of the trickier things about llvm-vs-glibc will be that libcxx 17 will require glibc >=2.24.

Footnotes

  1. I'm talking about in the source, not small changes in the metadata or the build files

@beckermr
Copy link
Member

Right. The whole "redhat not available to alma" adds some additional color to this.

@h-vetinari
Copy link
Member

Next issue: The upcoming OpenSSL 3.2 requires sendmmsg, which got added in glibc 2.14, at least when building with QUIC (==http3) support, which everyone and their dog has been waiting for with bated breath1.

The saving grace in this case is that OpenSSL 3.x is ABI-compatible, so we could keep the pinning at 3.1 while allowing compatible clients to pull in OpenSSL 3.2 at runtime.

Still, it's one more bandaid to have to keep in mind...

Footnotes

  1. if you have nothing better to do, most of the history is in https://github.com/openssl/openssl/pull/8797

@hmaarrfk
Copy link
Contributor

hmaarrfk commented Sep 8, 2023

I'm pretty inclined to try to get to a consensus on how to deal with operating systems that are end of lifed by their original companies.

I feel like we are repeating many of the points we did for OSX.

I know formulas are "bad" but it would be great to have something like Nep29 or SPEC0 where rules are centralized.

@h-vetinari
Copy link
Member

FWIW, OpenSSL 3.2 restored compatibility with glibc <2.14.

it would be great to have something like NEP29 or SPEC0 where rules are centralized.

💯

@jakirkham
Copy link
Member Author

It would be good if we could get more stakeholders to buy into those rules (like the Python distribution and wheels). Would make it easier to move forward as a united front

Other advantage of rules is it makes planning for large organizations easier. They can see how long they can use something and when they might need to plan for changes

@beckermr
Copy link
Member

beckermr commented Aug 1, 2024

As we've actually dropped cos6 now, I am thinking we should close this. Comments @conda-forge/core?

@mbargull
Copy link
Member

mbargull commented Aug 1, 2024

Sounds good to me in principle.
But I hadn't yet checked if we have anything left to do re conda-forge/conda-forge-pinning-feedstock#6070 (comment) :

Things apart from the pinning changes that come to mind:

  • adjust numbers of track_feature to sysroot_linux-64
    (somewhat obsolete due to {{ stdlib('c') }}; could be relevant for downstream users for a limited amount of time)
  • docs
  • (can be done later: remove linux-anvil-comp7 build files)

(I can't really look into it now but hope to get to look into conda-forge & co. stuff in 1-2 weeks.)

@beckermr
Copy link
Member

beckermr commented Aug 1, 2024

Also this item from the cos6 PR: conda-forge/conda-forge-pinning-feedstock#6070 (comment)

@beckermr
Copy link
Member

beckermr commented Aug 1, 2024

And we may want to drop the current repodata hacks.

@beckermr
Copy link
Member

beckermr commented Aug 1, 2024

plus this discussion: conda-forge/linux-sysroot-feedstock#63

@h-vetinari
Copy link
Member

@h-vetinari
Copy link
Member

h-vetinari commented Nov 20, 2024

Here's a list of all the points that were mentioned since August

@h-vetinari
Copy link
Member

Given that all tasks mentioned since the switch are now done (with the exception of the sysroot discussion around prioritization of versions, but those have separate issues and can be handled there), I'm going to close this issue. Whew, what a journey this one has been! 😅

@jakirkham
Copy link
Member Author

Thanks everyone who helped push this forward! 😄 👏

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

No branches or pull requests

10 participants