Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow runners to have multiple platform configurations #75

Open
YngveNPettersen opened this issue Dec 5, 2020 · 2 comments
Open

Allow runners to have multiple platform configurations #75

YngveNPettersen opened this issue Dec 5, 2020 · 2 comments

Comments

@YngveNPettersen
Copy link

As I understand the worker configuration definition. a worker can have a combination of properties, such as this one in the dockers example:

     platform: {
        properties: [
          { name: 'OSFamily', value: 'Linux' },
          { name: 'container-image', value: 'docker://marketplace.gcr.io/google/rbe-ubuntu16-04@sha256:b516a2d69537cb40a7c6a7d92d0008abb29fba8725243772bdaf2c83f1be2272' },
        ],
      },

However, as far as I can tell, the platform specification cannot specify multiple properties set for a given runner, e.g this way.

     platform: [{ 
        properties: [
          { name: 'OSFamily', value: 'Linux' },
          { name: 'container-image', value: 'docker://marketplace.gcr.io/google/rbe-ubuntu16-04@sha256:b516a2d69537cb40a7c6a7d92d0008abb29fba8725243772bdaf2c83f1be2272' },
        ],
      }, { 
        properties: [
          { name: 'OSFamily', value: 'LinuxToo' },
          { name: 'container-image', value: 'docker://marketplace.gcr.io/google/rbe-ubuntu16-04@sha256:b516a2d69537cb40a7c6a7d92d0008abb29fba8725243772bdaf2c83f1be2272' },
        ],
      }],

I am aware that this can be implemented by specifying multiple runners, but that would have consequences for the concurrency specifications. Specifically, my reading of the code is that if N is the number of concurrent runners that can be used in parallel on the worker, then two runners would have to be split N1+N2 <= N, otherwise the worker can be overloaded if all the runners are allocated by the scheduler.

Background: I recently tried to add a couple of old machines as workers to our Goma/Buildbarn system, with Windows cross-compile on Linux, but the system became unstable for some reason. While testing the upgraded system in the past couple of days I found that the instability is still present, and seemed to be caused by the case-insensitive file system mount (ciopfs) we need to use for the Windows cross-compile (many Windows SDK files are included with incorrectly cased names, from inside the SDK) . ciopfs seems to stall at times, causing long periods of the worker and ciopfs not doing any building, one case lasted about 40 minutes. My guess is that this problem is related to both SSD disk speed and possibly the number of parallel processes (we have a the same ciopfs configuration on a different worker, with much a faster CPU, more cores/threads, and a NVMe disk, which does not have this problem).

While there may be other ways to get a case-insensitive filesystem running, one alternative possibility would be to assign these workers to be Linux-only workers, without Windows-cross-compile (I have not yet tested this configuration).

However, it does not seem like the action system permit multiple platform specifications; "Use one of these platforms". This indicates that the Windows cross-compile workers need to be specifies as "LinuxWindows" platforms, while Linux workers have to be specified as "LinuxOnly". However, the "LinuxWindows" workers should also be able to run "LinuxOnly" builds, and AFAICT that is not possible, except by specifying multiple runners, and splitting the concurrency number between each of them, which also means halving the performance, except if there are Windows+Linux builds going on at the same time.

IMO either the concurrency system must be changed so that only N number of runners can be active at a time, or a single runner group should be available for multiple platforms.

@EdSchouten
Copy link
Member

Hi Yngve,

Are you aware of issue #40? It's not identical to what you requested here, but I think that if implemented properly, it could also be used to achieve something similar. Instead of letting workers announce multiple platforms, we could have a mechanism where the scheduler notifies some helper process, requesting it to spin up new workers of a given kind. In case resources aren't elastic (e.g., on physical systems), you could use that process to simply reconfigure a worker from variant A to variant B.

Does it make sense to keep this open, or do we want to fold this into #40?

@YngveNPettersen
Copy link
Author

I did see it, but I think our use case could be different enough that a separate report should be filed.

In our case we are using a fixed set of hard workers, not a docker-type system. The limits on CPU usage is therefore fixed in a way a cloud based system might not be.

It could be that the solutions will be the same, but there could be differences in how the two variations work, too.

Based on recent testing, having separate work areas (runners) for the Windows cross-compile and Linux could be useful; as I noticed that Linux build actions sometimes ran into trouble due to the case-insensitivity used in the workarea mount, but as mentioned, multiple runners on a worker require an overall management of concurrency that AFAICT is not currently possible

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants