Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TOOL-16724 bcc: build fails when using "--parallel" on Delphix buildserver #14

Conversation

prakashsurya
Copy link
Contributor

@prakashsurya prakashsurya commented Nov 14, 2022

I don't understand the root cause, but when doing the build in parallel (i.e. make -j2), it fails when running on the Delphix based buildserver. This change modifies the build, such that we always use make -j1; obviously this isn't ideal, but until we root cause a fix the underlying problem, this would at least allow us to build on the Delphix buildserver.

As one might expect, this does increase the time to build the package by about 2x; I don't think this is a big deal, since we don't actually build this package very often, and the total time to build is still "reasonable" (i.e. ~20 mins).

Lastly, we haven't synced our bcc codebase with upstream in quite a while, so we might inherit a fix for this parallel building issue when we do that sync. At that time, we could look at adding back the "--parallel" flag, to see if the build would succeed with the updated code.

@prakashsurya prakashsurya force-pushed the dlpx/pr/prakashsurya/f29273e2-ef0b-4d29-a03e-49e1409f89be branch from f11e2a1 to d68ce7f Compare November 14, 2022 19:21
@prakashsurya prakashsurya marked this pull request as ready for review November 14, 2022 19:26
Copy link

@sdimitro sdimitro left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm good with the change, just a couple of questions:

  • Does this bleeding-edge build also fail in the normal bootstrap VM?
  • What does the error message look like?

@prakashsurya
Copy link
Contributor Author

What does the error message look like?

See here for an example failure.

Does this bleeding-edge build also fail in the normal bootstrap VM?

You mean, does #13 fail on the current build image? If so.. yea, we can see the failure in that PR's checks.. i.e. see here and link directly to the build here.

@prakashsurya
Copy link
Contributor Author

Also, since testing this change out (i.e. no --parallel), I was able to get 3 consecutive successful builds on the Delphix buildserver image.. without it, I got multiple failures, and only one successful build (see here). The fact that I was able to get one successful build without this change, though, made me think maybe it had something parallel race issue.. which led me to try this out, and this seems to be working..

@prakashsurya
Copy link
Contributor Author

prakashsurya commented Nov 14, 2022

I'm going to close this, and not land it..

turns out, with help from Seb, the issue is a result of there being a swap device on the Delphix buildserver.. currently, linux-pkg will try to dynamically add swap using a file on the root filesystem.. I've disabled that, since using swap on ZFS doesn't quite work.. but, if I use a dedicated disk for swap, seperated from the ZFS root pool, bcc will build on the Delphix buildserver just fine (even with --parallel)..

so, I think the better way to fix this is to modify linux-pkg to properly add the swap device when running on the Delphix buildserver.. I'll look into this approach, instead.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

3 participants