Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bob managed layers #561

Merged
merged 5 commits into from
Sep 27, 2024
Merged

bob managed layers #561

merged 5 commits into from
Sep 27, 2024

Conversation

rhubert
Copy link
Contributor

@rhubert rhubert commented Apr 2, 2024

This is a very rough first draft of support for layers handled by bob to get some early comments about this.

The idea is to add a scm-spec to the layers section of the config.yaml:

layers:
  - foo:
     scm: git
     url: git@foo:/foo.git
     branch: master
     commit: ....

A layer specified like this also changes the behavior when using nested layers where multiple layers depend on the same layer as the layer structure is flattened and each layer is checked out only once. This make it possible to have the same recipes either used as root-recipes repo or as layer from another recipes package. (ATM this is done by matching the name of the layer only, maybe the url should be used instead or a additional uuid,... ??)

ATM a internal .bob_layers root package is generated for this with dependencies to a package for each layer. I don't know if this is a good idea but it enables reusing most of the code. ;)

If this is a viable approach it would need some filtering for the other commands like bob ls to avoid the .bob_layers package is shown there as well. Maybe this filtering can be done by conditionally adding the .bob_layers to the virtualRoot and providing the define by the layers-command? But this would result in longer parsing times when switching from bob layers to any other bob command.

@jkloetzke
Copy link
Member

Taking a step back: what is the actual motivation of having Bob managing the layers? What is the benefit of adding this feature compared to git submodules? I think we should have a compelling benefit to add the complexity. Could you shed some more light on this?

If we go this route, the following things come to my mind:

  • Collapsing the layers hierarchy is probably OK. I don't think that anybody has ever used nested layers.
  • Internally treating the layers with the same logic will most probably not fly.
    • Nested layers require some loop to discover and parse the next level. The current logic can handle only exactly one level.
    • Error messages and other output will probably be misleading.
  • I could imagine that one may want to optionally automatically update layers unless building with --build-only.

@rhubert
Copy link
Contributor Author

rhubert commented Apr 7, 2024

Taking a step back: what is the actual motivation of having Bob managing the layers? What is the benefit of adding this feature compared to git submodules? I think we should have a compelling benefit to add the complexity. Could you shed some more light on this?

There are two reasons:

My current layer structure would look like this if it would be supported:

- recipes a
   - layer b
     - layer c
   - layer c
   - layer d 
     - layer b
       - layer c
     - layer c

layer d also contains recipes for a stand alone product and it would be nice to be able to build then separate to recipes a. As of today this is not possible without changing the config.yaml of d and to add c and d to the layers.
The remaining dependencies are just invisible and one would need to know them when reusing one of the other layers.

IMO this can't be solved with the current layers except by splitting the recipes much more. Or do you have any suggestions how this could be solved?

The other reason is "because some developers don't like submodules." We have (at least) one project where the layer content has been copied into the recipes just to avoid the need of using submodules. (I'm not sure if using bob managed layers would help them...)

If we go this route, the following things come to my mind:

* Collapsing the layers hierarchy is probably OK. I don't think that anybody has ever used nested layers.

I guess we can also add a policy for this?

* Internally treating the layers with the same logic will most probably not fly.
  
  * Nested layers require some loop to discover and parse the next level. The current logic can handle only exactly one level.

Yes, the loop is missing.

  * Error messages and other output will probably be misleading.

* I could imagine that one may want to _optionally_ automatically update layers unless building with `--build-only`.

I also thought about this as it would avoid the need of adding a new command. Not sure what would be more convenient.

@jkloetzke
Copy link
Member

Taking a step back: what is the actual motivation of having Bob managing the layers? What is the benefit of adding this feature compared to git submodules? I think we should have a compelling benefit to add the complexity. Could you shed some more light on this?

There are two reasons:
...
layer d also contains recipes for a stand alone product and it would be nice to be able to build then separate to recipes a. As of today this is not possible without changing the config.yaml of d and to add c and d to the layers. The remaining dependencies are just invisible and one would need to know them when reusing one of the other layers.

IMO this can't be solved with the current layers except by splitting the recipes much more. Or do you have any suggestions how this could be solved?

Right. This makes sense and I also cannot see how it could be solved with git submodules either. So let's do it.

The other reason is "because some developers don't like submodules." We have (at least) one project where the layer content has been copied into the recipes just to avoid the need of using submodules. (I'm not sure if using bob managed layers would help them...)

I'm not sure either. But having an "automatic" layers update that behaves like the SCMs in regular recipes might be more attractive.

If we go this route, the following things come to my mind:

* Collapsing the layers hierarchy is probably OK. I don't think that anybody has ever used nested layers.

I guess we can also add a policy for this?

No, I wouldn't unless somebody speaks up. I have never heard of anybody using nested layers. We can still add such a policy if somebody comes around.

  • I could imagine that one may want to optionally automatically update layers unless building with --build-only.

I also thought about this as it would avoid the need of adding a new command. Not sure what would be more convenient.

I think we need both. The layers command makes perfectly sense. I was thinking about the case where somebody clones a project and expects it to "just work". So

git clone ...
cd project
bob dev ...

should just work, even when layers are used. On the other hand, you might not always want the automatic layer pulling. So some overridable option to bob build/dev makes sense.

@sbixl
Copy link

sbixl commented Apr 8, 2024

We have similar use cases as reported by @rhubert. I often get the question why the layers can't be managed better via bob (in analogy to the SCMs in the recipes).

The current situation is that when a layer is updated, the whole team must be notified about this to update their local gitsubmodule(s). Sometimes it is forgotten and the colleague work on an older version. It would be nice if this update could be done via bob dev/build in the background.

However, if the main recipes have a breaking dependency on one or more layers, these would also have to be updated. In this use-case, updating the layer alone would lead to problems and you would have to explicitly make a pull on the recipes (if you know it, which is mostly not the case).

The use of nested layers also makes sense (and we are now at the point where we absolutely need it). I have already experimented with nested layers but encountered similar problems to those reported by @rhubert.

As mentioned:

  • Having an "automatic" layers update that behaves like the SCMs in regular recipes might be more attractive, would be a great improvement.
  • Executing git clone ... without the knowledge of a gitsubmodule (--recurse-submodules) behind would improve the use-ability much more.
  • Updating the main recipes in addition to the layers would be nice too. This could be done via a query that has to be confirmed by the user. This way the user knows that something has changed and an update is being carried out. This query could also be made configurable and set to auto-update so that the system does not wait for input e.g. in case of a CI build.

@rhubert
Copy link
Contributor Author

rhubert commented Apr 8, 2024

The current situation is that when a layer is updated, the whole team must be notified about this to update their local gitsubmodule(s). Sometimes it is forgotten and the colleague work on an older version. It would be nice if this update could be done via bob dev/build in the background.

However, if the main recipes have a breaking dependency on one or more layers, these would also have to be updated. In this use-case, updating the layer alone would lead to problems and you would have to explicitly make a pull on the recipes (if you know it, which is mostly not the case).

I think the last point would become valid if floatinglayers are used, which - with submodules - is not possible(?). And even if layers managed by bob this is probably not a good idea. Otherwise the layers are bound to a version of the recipes and can't become incompatible to the recipes? Or do I miss something?

@sbixl
Copy link

sbixl commented Apr 9, 2024

I had the following scenario in mind (assuming the layers are no longer managed via gitsubmodules):

layers:
  - foo:
     scm: git
     url: git@foo:/foo.git
     branch: master # <-- layer is floating

Changes are made to the layer that also affect the main recipes (e.g. incompatible changes). Developer 1 adjusts the main recipes accordingly and push the changes to the remote. Developer 2 or a whole team knows nothing about this and would automatically update the layer with the next bob dev/build because it is floating. The system then stops work building because nobody knows that the recipes also need to be updated. This quickly leads to frustration, especially among colleagues who are not so familiar with recipes and their interrelationships. So if floating layers are an option there should be an anditional check if the main recipes must be updated too.

But if I think more about it for a while, it makes no sense to work with floating layers. If several layers are nested inside each other, this increases the level of complexity considerably, becomes confusing and is error prone. I think it would be better to always hard link the layers to a tag/commit like the gitsubmodules already do it.

So I agree that this was not the best idea. ;-)

Nevertheless, it would be a nice feature when the main recipes are on a branch and there are new changes on it you can you can get a notification from bob about this. In our team, at least, this is a recurring use case where you always have to make sure that everyone is really synchronized. I think this can be solved more elegant (it should be relatively easy for Bob to provide something like this?)

@rhubert
Copy link
Contributor Author

rhubert commented Apr 10, 2024

Nevertheless, it would be a nice feature when the main recipes are on a branch and there are new changes on it you can you can get a notification from bob about this. In our team, at least, this is a recurring use case where you always have to make sure that everyone is really synchronized. I think this can be solved more elegant (it should be relatively easy for Bob to provide something like this?)

I think this can't be easily done as bob has no knowledge about where the recipes are coming from. Could be git, svn, cvs, ... so it might be hard to fetch them.

As of today you can add a preBuildHook to check if your recipes are up-to-date, otherwise you can output a warning and exit with a non zero exit code.

https://bob-build-tool.readthedocs.io/en/latest/manual/configuration.html#hooks

Note: you can not simply git pull your recipes in the hook and exit with 0 as the hook is executed after the recipes parsing has been done and the build would still use the old recipes.

@rhubert
Copy link
Contributor Author

rhubert commented Apr 22, 2024

I pushed some updates to the code to avoid the internal package.. It's still not ready for review...

Anyway - after some internal discussions we think that merging the (sub-) layers of different layers by their name/url/whatever might not work very well. Instead we want to filter the layers provided by a sublayer when depending on it:

Given a layer foo with

# foo - config.yaml
layers:
 - bar:
   [...]
 - baz:
   [...]

The user of this layer can avoid the use of bar from foo by adding only baz to foo's useSubLayers , e.g. because foo also has a direct dependency on bar

layers:
 - foo:
    checkoutSCM:
      scm: git
      url: ....
      commit: ...
    useSubLayers: [baz]
 - bar:
   [...]

Not adding useSubLayers will simply use all.

@jkloetzke
Copy link
Member

Anyway - after some internal discussions we think that merging the (sub-) layers of different layers by their name/url/whatever might not work very well.

I agree that merging them on the basis of the URL is certainly not viable. But why not by name? Projects/layers that include other layers should know the naming below. Only if two layers refer to some common third layer by a different name things go south. But is this really likely to happen?

One could imagine to encourage people to use reverse polish notation for the layers (e.g. dev.bobbuildtool.basement). That would make the names more predictable.

Instead we want to filter the layers provided by a sublayer when depending on it:

I'm not sure that this is better. The behaviour can get very subtle I guess. Also, how it it supposed to work transitively (control the sub-sub-layers)?

I don't have any better idea at the moment but if feels its more like hack...

@rhubert
Copy link
Contributor Author

rhubert commented Apr 25, 2024

The idea was to have more control about the layers. e.g. if a common sublayer is used by 2 different layers one might need to use the newer layer spec. And if there are two common sublayers by 2 different layers they might cross reference newer versions of their sublayers:

- recipes
 - layer foo
   - layer b (1.0)
   - layer c (1.1)
  - layer bar
   - layer b (1.1)
   - layer c (1.0)

To make this work we'd need to select c from foo and b from bar. Or add b (1.1) and c (1.1) to the recipes?

@jkloetzke
Copy link
Member

My current line of thinking goes like this:

We would always flatten the layer dependency tree by name. This happens even for layers that are not SCM-managed but just referenced in config.yaml. So whenever layers have the same name, only the first one of them is used. Which one it is, is defined by the order.

The project has the highest precedence. So if a layer "b" is defined in the project, it would override any deeper layer dependencies (layer foo and bar in your case). If the dependencies to some deeper layer are on the same level (b and c in your case), the order is important. It's already defined that layers in config.yaml are named from highest to lowest precedence. So in your case layer b (1.0) and layer c (1.1) would be used because layer foo has a higher precedence than bar. If you want b (1.1) and c (1.1), they must be added to the project (at least b (1.1)).

While this can still be very subtle what is actually used, it at least is consistent with the current layer precedence. For SCM managed layers, Bob should certainly not even checkout layers that were already satisfied by higher precedence layers. So it will be more or less obvious what is going to be used.

Does that make sense?

@rhubert rhubert force-pushed the rh-layers branch 2 times, most recently from e17e94b to 1aa7863 Compare June 2, 2024 18:07
Copy link

codecov bot commented Jun 2, 2024

Codecov Report

Attention: Patch coverage is 95.95960% with 12 lines in your changes missing coverage. Please review.

Project coverage is 88.80%. Comparing base (080fc60) to head (73d00e9).
Report is 6 commits behind head on master.

Files with missing lines Patch % Lines
pym/bob/layers.py 95.76% 8 Missing ⚠️
pym/bob/input.py 94.11% 3 Missing ⚠️
pym/bob/state.py 93.75% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##           master     #561      +/-   ##
==========================================
+ Coverage   88.66%   88.80%   +0.13%     
==========================================
  Files          46       48       +2     
  Lines       14860    15141     +281     
==========================================
+ Hits        13176    13446     +270     
- Misses       1684     1695      +11     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@rhubert rhubert force-pushed the rh-layers branch 3 times, most recently from 43d30d5 to 59043e2 Compare June 3, 2024 19:45
@rhubert rhubert marked this pull request as ready for review June 3, 2024 19:53
@rhubert
Copy link
Contributor Author

rhubert commented Jun 3, 2024

I think this is now in a state where it might make sense to start reviewing process...

One note to the last commit XXXXXX: DevelopDirOracle: make a singleton + fix recalculation. This tries to fix the failing generator blackbox test before I noticed that I missed

   62     recipes.setUpdateLayers(not args.execute_buildonly)

in pym/bob/cmds/build/project.py. With this line it's not needed and I'm unsure if it's a good idea to drop and rebuild the cache if the key changes within a single bob-run? Maybe it would be better to raise a internal exception in that case?

@rhubert rhubert changed the title RFC: bob layers bob managed layers Jun 3, 2024
@jkloetzke
Copy link
Member

Thanks. I'll have a look in the next days.

@jkloetzke
Copy link
Member

I'm not entirely sure if all of the following makes sense but these are my first thoughts after skimming though the stack for an hour or so.

  • We should probably only create one EventLoopWrapper() and use it for both the layers update and the build instead of using two distinct instances.
  • Instead of calling setUpdateLayers() and doing the update magically in parse(), it would be better to call an updateLayers(eventLoop) to do the update explicitly.
  • Maybe we should exclude the import SCM? It doesn't look useful and we could spare some special cases.
  • Using setDirectoryState()/getDirectoryState() is probably not a good idea because these directories are assumed to be workspaces by the clean and status commands. The layers state should be stored in a separate category.

The layer schema looks more complicated that it needs to be. How about the following?

layers:
  - name: foo
    scm: git
    url: ...
  - name: bar
    ...

I'm not sure about the DevelopDirOracle change yet. My gut feeling is that it points to some underlying problem and making it a singleton just papers over it. But I have to admit I did not think this through.

I was hoping that we could integrate the layers update in the regular build TUI. Especially to use multiple jobs in parallel. But that may be something for a later step.

@rhubert
Copy link
Contributor Author

rhubert commented Jun 13, 2024

Thanks for the comments :)

* We should probably only create one `EventLoopWrapper()` and use it for both the layers update and the build instead of using two distinct instances.

The layers update is also needed for non build commands like bob ls where we do not have a EventLoopWrapper() (yet?). It could be created in the RecipeSet and provided to the build-commands using a getter?

* Instead of calling `setUpdateLayers()` and doing the update magically in `parse()`, it would be better to call an `updateLayers(eventLoop)` to do the update explicitly.

👍 will do this.

* Maybe we should exclude the `import` SCM? It doesn't look useful and we could spare some special cases.

👍

* Using `setDirectoryState()`/`getDirectoryState()` is probably not a good idea because these directories are assumed to be workspaces by the `clean` and `status` commands. The layers state should be stored in a separate category.

Thanks - I wasn't aware of this ...

The layer schema looks more complicated that it needs to be. How about the following?

layers:
  - name: foo
    scm: git
    url: ...
  - name: bar
    ...

I think this is a reasonable simplification. We'd loos some flexibility but I can't think of a use-case for having multiple git-scms for a single layer or something like this..

I was hoping that we could integrate the layers update in the regular build TUI. Especially to use multiple jobs in parallel. But that may be something for a later step.

The layers update is usually fast (compared to build + parse times). Not sure if there is a benefit of doing this in parallel, but I'll keep it in mind.

@jkloetzke
Copy link
Member

The layers update is also needed for non build commands like bob ls where we do not have a EventLoopWrapper() (yet?). It could be created in the RecipeSet and provided to the build-commands using a getter?

I'm not sure that this is a good idea. Especially bob ls is not a state-changing command. I would be rather surprised when any layers would be updated in this case. IMHO there is only bob build, dev and the layers command that are expected to update the layers.

@rhubert rhubert force-pushed the rh-layers branch 3 times, most recently from af27517 to 4f3d1d4 Compare June 19, 2024 12:44
@rhubert
Copy link
Contributor Author

rhubert commented Jun 19, 2024

I did some rework and changed the code following your suggestions. Take your time for review I'm N/A the next weeks, so I there will be no more update from my side before mid end of July ;)

Copy link
Member

@jkloetzke jkloetzke left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is heading in the right direction. I still miss the loop to handle arbitrarily nested layers. Did I overlook it?

pym/bob/input.py Show resolved Hide resolved
pym/bob/input.py Outdated Show resolved Hide resolved
pym/bob/input.py Outdated Show resolved Hide resolved
pym/bob/input.py Outdated Show resolved Hide resolved
pym/bob/input.py Outdated Show resolved Hide resolved
pym/bob/input.py Outdated Show resolved Hide resolved
pym/bob/input.py Outdated Show resolved Hide resolved
pym/bob/layers.py Outdated Show resolved Hide resolved
pym/bob/state.py Outdated Show resolved Hide resolved
@rhubert
Copy link
Contributor Author

rhubert commented Jul 17, 2024

I think this is heading in the right direction. I still miss the loop to handle arbitrarily nested layers. Did I overlook it?

Layers::__collect recursively calls itself...

@jkloetzke
Copy link
Member

I'm looking into the PR in detail but it will take some time. I'll tinker with it and probably push some fixups if that's ok...

@jkloetzke
Copy link
Member

I'm not really comfortable with the default.yaml handling during the layers update so far. It's also why the tests are failing. We try to parse the user configuration files but paper over any problem. This will certainly lead to surprising results. Maybe we should not even try that? Or do we need variable substitution in the layers SCMs?

@rhubert
Copy link
Contributor Author

rhubert commented Aug 15, 2024

I think having scmOverrides for the layer scm's is somehow important. As user I'd except they behave like normal checkoutSCM. And - in our build environment - we're using mirrors for all upstream repos. If we want to use upstream-layer-foo which has a configured to use upstream-layer-b with a checkout to https://github... we'd need to override this to point to our server. That's why I think parsing at least the default.yaml in the recipesRoot would be nice. Also to get the scmDefaults for the layers checkout,.... I pushed a fixup to do it like this.

If this won't work I think no support for Overrides+defaults in the layer scm's would be also acceptable. In this case one could still name upstream-layer-b in the recipes-root config.yaml ...

@jkloetzke
Copy link
Member

I see. In principle I fully support to have scmOverrides and environment substitution. The problem is that this creates us a circular dependency.

To fetch and/or update the layers we have to naturally go in top-to-bottom order. Even though it may not evident, the parsing of default.yaml of layers is done in bottom-up order. The rationale is so that upper layers can rely on layers beneath it and can also override settings from lower layers. For example, you could have a lower layer providing some string functions through a plugin and a higher layer uses that in it's environment section in default.yaml.

I'm not sure how we can solve this elegantly (or at all)...

@rhubert
Copy link
Contributor Author

rhubert commented Aug 16, 2024

Thanks for the explanation...sometimes it takes longer but now I got the point...I guess....

I removed the parsing of the default.yaml while updating the layers. Should be sufficient for now to have no scmOverrides for the layers - let's see how long this holds...
If needed some light+static parsing of the scm-relevant settings of the root-default.yaml without support for (additional) string functions and all the stuff which could be added by lower layers might be a solution...

pym/bob/input.py Show resolved Hide resolved
pym/bob/input.py Outdated Show resolved Hide resolved
pym/bob/layers.py Outdated Show resolved Hide resolved
pym/bob/layers.py Outdated Show resolved Hide resolved
pym/bob/layers.py Show resolved Hide resolved
Allow to specify scm properties for layers and use them to checkout /
update the layers. Only git, svn, cvs or url scms are supported.

Having layers with a SCMSpec they are only updated during bob dev|build
if --build-only is not set. Otherwise they are not touched by any
existing bob command.

In addition the new `bob-layers` command can be used to update layers
and show their status the same way like `bob status` does.
@jkloetzke jkloetzke merged commit 3277746 into BobBuildTool:master Sep 27, 2024
11 checks passed
@jkloetzke
Copy link
Member

Thanks. I think this still needs some polishing IMHO. But this can be done on top of it. The basic functionality is most probably ok.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants