-
-
Notifications
You must be signed in to change notification settings - Fork 706
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
std/process: Default to libc closefrom in spawnProcessPosix #9048
Conversation
Thanks for your pull request and interest in making D better, @the-horo! We are looking forward to reviewing it, and you should be hearing from a maintainer soon.
Please see CONTRIBUTING.md for more information. If you have addressed all reviews or aren't sure how to proceed, don't hesitate to ping us with a simple comment. Bugzilla references
Testing this PR locallyIf you don't have a local development environment setup, you can use Digger to test this PR: dub run digger -- build "master + phobos#9048" |
An example of such a system is a docker container which comes with a |
2bfbb5d
to
3a85f42
Compare
Retrigger CI and remove some braces that didn't conform to the style guide. |
Is there a bugzilla issue for this? If not, please create one, then retitle the commit message to say "Fix bugzilla #####: std/process: Default to libc |
I didn't find any when I wrote the changes. I'll go and create one. |
3a85f42
to
43c7b05
Compare
CircleCI is failing because it uses ubuntu-20.04 as a base which uses glibc-2.31 and closefrom has been added to glibc-2.34: https://sourceware.org/pipermail/libc-alpha/2021-August/129718.html. Is there any way to check the version of glibc or is it acceptable to bump the circle CI image to ubuntu-22.04? |
But wouldn't that mean that this PR would make D programs incompatible with Ubuntu versions older than 22.04? |
Yes, it would break compiling on systems that dont't have >=glibc-2.34 which was released in august 2021. The current code is, under the circumstances of large ulimit, broken on any posix system. Either way something will be broken. If I were to fix this I would introduce a |
std/process.d
Outdated
version (CRuntime_Glibc) | ||
import core.sys.linux.unistd : closefrom; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This mismatches the version gate of the linux.unistd module (it is possible to be glibc without being on linux).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought about that when writing the code but I assumed that glibc is linux only, thanks for clarifying this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hurd is one example of an actively developed system that uses glibc.
kFreeBSD and kOpenSolaris were two others - both thankfully confined to history as a failed experiment (for the latter when developers realized the complexity of the task :-)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The good news is that glibc's closefrom implementation is not in the linux-specific directory (it is io/closefrom.c
), and the abilist for hurd includes the expected:
GLIBC_2.34 closefrom F
To make this a druntime issue, it might be best for Phobos to do the following: version (linux) import core.sys.linux.unistd;
...
static if (!__traits(compiles, closefrom))
{
void closefrom (int lowfd)
{
...
}
}
...
closefrom(forkPipeOut + 1); |
But won't this make it impossible to also run D programs on older systems, rather than just build them? We would need to use |
It looks like what I want but I think the closefrom libc solution can be better if I can get it to work properly. #8990 can also be applied on top of this to help the systems that don't have
Yes, I want to rewrite it like this but I would want the
I don't understand what you refer to. phobos isn't backward abi compatible so the changes here would affect only programs written in the future. If they can't be built then they won't run, obviously. Do you mean that one won't be able to compile a program on a recent glibc and run it on an older one, for example, the
I think we can solve all of this during the building of phobos, see above. From my understanding, because it is not a template, the function |
That's why the Phobos doesn't need to concern itself about what platform its running on, only what features are available at run-time. |
Yes, there needs to be a way to check for the existence of the function. If you do it in druntime how do you support multiple versions of glibc? It's either they are all assumed to have The difference with phobos is that the build system knows for what target it is compiling so it can inspect those values are make a decision.
And that decision is made at built time, right? |
Do you suggest having druntime be like: module core.sys.linux.unistd;
public import core.sys.linux.config;
static if (GLIBC_VER_NUM >= 234)
void closefrom(int); and module core.sys.linux.config;
enum GLIBC_VER_NUM = @SED_ME_IN@; And replace |
We want to allow people to build and run D programs even on computers which have older So, if we wanted to keep supporting those situations, but still use If we try to simply use
|
Therefore the bindings D uses to the libc headers provided by the host should match the definitions for that host. If the host provides
I don't think you're supposed to be able to downgrade glibc which is exactly what happens when you take a binary compiled for a newer glibc and try to run it against an older one. If you want to support systems with older glibcs you can compile for the older one, in that case the application will work on newer systems as well because glibc does provide backwards compatibility. Am I misunderstanding something? |
But that's not relevant, is it? D does not use the libc headers.
This is only true if we're only talking about never distributing binaries. I think many D users do rely on building a binary on their computer and then running it on another computer. |
What are the headers in Note that just because that arbitrary range of versions is not written anywhere doesn't mean that it doesn't exist. Functions that are either soon to be removed or too recent shouldn't be part of those headers in that situation, and this should be documented somewhere. Now, in phobos' case, in does need to perform system calls so it would need those prototypes for those functions somewhere. It makes the most sense that those prototypes be under The second approach for the headers is have them represent the interface that is provided by libc for the target system. This is what, and I think @ibuclaw suggested. In this situation they should match what the C system headers define so we might as well call them that.
Distributing binaries can be done with any of the approached above, you just need to make sure that you compile the programs properly. This does mean obvious things like not compiling for a different architecture nor os but it also includes not using CPU extensions that not all target machines support and not making calls to external functions that may not exist. It all comes down to what are the actual systems that you want to support. In phobos' case, if we only resort to what CI says, portable enough means being able to run on ubuntu 20.04. If portability is desired then, in the first situation in which In the second case in which the The fundamental difference in the second case is that the
If people do want to do this then they should do it willingly, by writing code that is portable. This implies that libraries that they link statically (like phobos by default) should also be portable. If we want to support this than having a minimum version of libc that we target is more helpful than "if CI is green it's portable". If we take this approach we should also consider the users who would prefer that the phobos code uses the efficient functions provided by their system rather then doing its own inefficient thing. At least I understand now what you mean by doing the check at runtime. libphobos can be compiled for an older glibc and check if the one that it loaded into memory happens to provided the function that it needs. In this situation I would highlight that if the code is bad enough that it would rather be replaced at runtime maybe it's worth considering bringing the minimum supported version higher. In this particular case I agree with you, depending on a version of glibc from 3 years ago is a little bit much. All in all, it think it would be nice if:
What I really want to see changed it having phobos by in this limbo of portability where there's no documentation about what's allowed and what isn't and the enforcement is a single CI run. |
Agreed, and this makes sense to me. I can't speak authoritatively but I think contributions which implement the above would be welcome.
There must have been a misunderstanding because I have no idea how this would be implemented in practice. We distribute Phobos+Druntime as precompiled libraries. The decision whether to use a certain function or not in this way would need to be done at compile time, so it clearly cannot be done in that way.
True in theory, but in practice, compilers already generally produce fairly portable binaries at their default settings, which even run well across multiple Linux distributions. Yes, this is not codified in any way right now (except said CI), but I don't think that's an excuse to drastically worsen the situation.
I don't know if it reflects the current situation or not, but to the best of my knowledge, the DFL's position is that D targets all OS versions which are supported by their vendor. So, for Windows, that would be Windows 10 or newer. Looking at what Canonical supports, I guess that would be Ubuntu 14.04 and newer. |
In theory you "can" do this, in practice this is incredibly dangerous. This is NOT about compilers, at all, period, end of story. This is about glibc, a library which provides a library API and a library ABI. And glibc also has a strict backwards compatibility story. You can compile programs using an old glibc and it shall continue working with new glibc. The reverse is not true. glibc's ABI is versioned. Every time they add a new symbol to the ABI, they version it by the release it was added in. So the minimum version of glibc you need to run a binary you have built, is the newest ABI your binary ended up including when it was compiled. This may be older than the glibc you compiled with, but it depends on glibc private implementation details. They do sometimes improve the codebase, providing newer versions of an older symbol that are an ABI break and therefore require providing two copies of the symbol: one is Your suggestion is to avoid binding to the ABI at compile time for closefrom, because there is no closefrom older than e.g. |
Wow, So in response to the discussion about using directly closefrom or using dlsym to look it up, I prefer the dlsym because it's very robust, and does not require synchronized compile-time versioning (a chronic problem). If you have the symbol, it's there, and if not, you do some fallback. |
And also as I already pointed out above:
|
I see, so we are in that situation already, as we use it in I guess that means that it is already not possible to use an older glibc than what a D binary is compiled against. So, using Is this accurate?
It was meant as a joke, apologies if it didn't go across well. |
I agree, the whole discussion is about what druntime should do so I will open an issue against it. Thank you for taking the time to discuss this, even if it seems like we haven't gotten anywhere from the start. You can leave this PR open and wait to the resolution of the issue or you can close it now and I'll reopen if druntime changes in a way that allows phobos code to do what I want.
Nothing that uses glibc declarations and links to it can do this, regardless of programming language. |
In a sense, I guess? Using dlsym doesn't make the glibc versioned API status quo worse because currently they are at the same version. Who knows what the future may bring -- if dlsym is upgraded again, it will make the status quo worse, if closefrom is upgraded again then the reverse will occur. But there are other reasons to strictly avoid dlsym! It means the compiler cannot offer a safety contract for your use of the ABI. If it changes in the future you won't know what to call and how, whereas by using the symbol normally, you can compile against the expected implementation. You can typecheck how you call it. It's also faster to avoid the indirection overhead. I'm afraid I don't really see a good reason to use a thoroughly substandard and unsafe technology such as dlsym when no real justification for doing so has been provided. What is the downside of using a regular function call if it is available? |
I can think of only two:
|
To be fair, it's not like there is a mechanism that checks that the declaration in Druntime matches the declaration in the C headers... so, it's not that different from casting from a
A function which only takes an |
This is not correct. When you specify The check is not automatic, you have to look inside the headers of glibc, inspect the function that you need and correctly translate that into a D declaration, but it is possible to do. This is not the case when you use Maybe the
This is again, an assumption we make because we expect code to be properly written. You can not infer the safety of a function just be looking at its signature. Take: /** Get the number of a cpu related structure
choice can be one of:
- PHYS_CPUS => the function returns the number of physical cpus in the system
- PHYS_CORES => the function returns the number of physical cores
- LOG_THRDS => the function returns the number of logical threads
*/
int get_cpus(int choice); You can not have a On top of this, as I explained above, you can't even know the signature of a function you get from |
Here is what I mean: extern(C) int closefrom(const char* str);
void main() { closefrom("Please close file descriptor 5 and higher"); } This compiles and links with no error. There is no mechanism that would verify that the function signature is correct, much like how a cast from Edit: I understand now that what you're trying to say is that the correct type to cast to from
That doesn't sound right, and I really don't understand what you're even trying to say. Do we need a sticker on every function that says "this function is bug free and doesn't do anything crazy" before we can put
I'm not sure if you're just exaggerating, but in case you're not, that's not how memory safety in D works. We generally define safety at the function level as being with respect to the arguments. So, if the function accepted an |
I see what you mean, and let's assume that we are indeed worried about the hypothetical situation that a future glibc version might change the ABI of Can we call |
As a general thing, this is doable for non-C languages by computing the language-specific function signature from the C header. If I understand correctly, that's what @the-horo suggested above. It would necessitate a build step, probably -- or shipping with versioned libc interfaces and using a version test to see which version of the shipped interface to use. |
If we're going to argue the semantics of
The standard doesn't say how you need to ensure that, only that it's your responsibility. I have said in my previous comment that you can verify that To explicitly point out what is broken: Calling the function Because there is no way to call this function without observing undefined behavior then all functions that (indirectly) can call our
If you are using If our If our All that I want is for the code not to introduce, willingly, more points of failure without any reason. Calling
There are two approaches to this I suggested. First we can have the D code inspect the C headers, just like they would be seen by a C program. We can use it in two cases:
// core/sys/linux/unistd.c
#include <unistd.h>
// core/sys/linux/unistd.d
import core.sys.linux.config
static if (__GLIBC_PREREQ(2, 34))
extern(C) void closefrom(int); // core/sys/linux/config.c
#include <features.h> The first approach is simpler but, looking at all the expected failures in https://github.com/dlang/dmd/blob/master/compiler/test/compilable/stdcheaders.c, if we do it we will probably start failing to compile on certain platforms with incompatible headers when the previous code worked, by some definition of the word. This will mostly affect gdc and ldc2 because they do target more exotic setups. The second approach is more portable only because it imports less so it has less of a chance to fail. Maybe the D header can also be considered more readable compared to looking in Both of these have the advantage that, if they work, the C bindings we get from it match exactly the ones in the glibc headers and, by extension, in the glibc library we link against. The second approach I discussed is embedding the version of glibc we build druntime against inside it and have the declarations match that version of glibc. It would look like: // core/sys/linux/unistd.d
import core.sys.linux.config
static if (__GLIBC_PREREQ(2, 34))
extern(C) void closefrom(int); // core/sys/linux/config.d
enum __GLIBC__ = 2;
enum __GLIBC_MINOR__ = 39;
bool __GLIBC_PREREQ(int, int) => ...; The enums will be auto populated at build time so the bindings will be for the version of glibc that was present during the build. This has the advantage that, once generated, the code will continue to work. By doing it during the build we can also support more targets because we can use a lot more tools, compared to our C parser implementation, to extract those values from the headers. Once generated the code will also not fail on exotic platforms because we can control all the parameters. The downside is that the headers can become out of sync if we compile for an older glibc and distribute on systems with newer ones. I want to highlight that this is exactly how the headers are currently implemented so this change has only upsides to the current situation. To fix the out-of-sync headers all there's needed is to modify the I admit that I use the D compilers that are compiled for my system so I am biased towards this solution because it has zero downsides and almost all the upsides of the first two solutions. The D provided archives, however, don't share the same circumstances but, since changing the version is pretty easy we can have the install.sh script fill in those values after extracting the archive by inspecting the target environment. I am so much against using It is unreasonable for me to think that phobos code should be dictated by what I want so I will accept the dlsym solution if that's what's decided. I only want to add to it trying to first check for a |
Apologies for the confusion but I made no claims about the safety of
Here is one more advantage of using
Would this work? |
All I wanted to show is the process of verifying that a glibc function declaration can be marked
Yes, if your use case is compiling for an older glibc and, possibly, running on a newer one then the only way to take advantage of the functions in the newer glibc is by using
Using plain I'm starting to find the argument about people wanting portability weak. If they desire portability then adding a dependency on
Currently we are in situation 3: Having a terrible implementation on all systems so, yes, using I don't believe that phobos should cater itself specifically to people who want portability. The standard library should be a collection of general code that works well in general. If people have a specific use case that needs specific code then the standard library is not the place for it. The
If, after this, we benchmark the default implementation code and compare it to using
If you need a specific version of a symbol you need to use |
43c7b05
to
ab94533
Compare
I hate it when bug fixes get stalled because people can't decide on a solution and I don't want to be part of the problem. All the downsides of using |
I am not sure why #8990 is stalled. @schveiguy What do you think, should we go ahead and merge that one?
No one is forbidding it, right? I'll happily approve a patch which uses |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, I see you've already done just that!
Question, what do you think about doing the dlopen
/dlsym
calls in the parent process, so that we can reuse the result across forks? I think it might also make it easier in the futureto use one of those lighter-weight fork
variant that restricts what we can do between fork
and exec
.
static bool tryGlibcClosefrom (int lowfd) { | ||
import core.sys.posix.dlfcn; | ||
|
||
void *handle = dlopen("libc.so.6", RTLD_LAZY); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- Why
libc.so.6
and notlibc.so
? - What do you think about using
RTLD_NOLOAD
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- Why
libc.so.6
and notlibc.so
?
If dlopen("libc.so")
ever finds something different than dlopen("libc.so.6")
then we definitely don't want to continue trying to load symbols. Very unlikely that this will ever happen but better safe then sorry.
- What do you think about using
RTLD_NOLOAD
?
Looking only at the man page of dlopen
I can't tell what those values actually do so I went with the value they showed in the example.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My thinking is that in the unlikely case that the current binary is actually using a libc that's not libc.so.6
then we probably don't want to actually load a second libc. I understand that RTLD_NOLOAD
should achieve that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
RTLD_NOLOAD
makes it so that closefrom
is no longer found so I'll keep RTLD_LAZY
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- This debate could be solved by avoiding dlopen/dlsym
- I find it vanishingly unlikely in the event Cruntime_Glibc is true but the soversion of glibc has advanced past "6", that you'll also have libc.so.6 available
ab94533
to
0f51821
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
…nProcessPosix The current implementation of spawnProcessPosix is broken on systems with a large `ulimit -n` because it always OOMs making it impossible to spawn processes. Using the libc implementation, when available, for doing file descriptor operations en-mass partially solves this problem. Signed-off-by: Andrei Horodniceanu <[email protected]>
0f51821
to
48d581a
Compare
FAOD.
|
I think we should merge both this and #8990, as the fallback. That way, we always pick the best available option for high rlimit.
IMO, the dlsym feels more robust, and if we don't use it, I'm sure we'll find out right away whether it breaks things. But I'm OK merging it without that if everyone feels it's not going to cause problems. I trust @ibuclaw has the goods on how exactly it could break or not. |
There's nothing stopping dmd from doing the same but it will be a little bit more complicated as it would have to be done during the build. I suggested above a solution that embeds the glibc version inside |
I think so, as it doesn't matter whether you're on AIX or Windows, if the function exists, it's there to be taken advantage of. |
Well, the function may exist but have a different signature. Even with |
I've made the bug report about carrying out the check in druntime: https://issues.dlang.org/show_bug.cgi?id=24725. @ibuclaw feel free to write any thoughts there. |
Is there any more discussion to be had here? I think we should merge this. |
The current implementation of spawnProcessPosix is broken on systems with a large
ulimit -n
because it always OOMs making it impossible to spawn processes. Using the libc implementation, when available, for doing file descriptor operations en-mass solves this problem.This PR requires dlang/dmd#16806 to be merged first.