Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[v2.5] batch flash nodes #201

Open
svenrademakers opened this issue May 13, 2024 · 8 comments
Open

[v2.5] batch flash nodes #201

svenrademakers opened this issue May 13, 2024 · 8 comments

Comments

@svenrademakers
Copy link
Collaborator

svenrademakers commented May 13, 2024

Is your feature request related to a problem? Please describe.
The Turing Pi can flash an OS image to a given (supported) module. The firmware loads a USB plug onto the module, which, in turn, exposes an API used to write the new OS images. On v2.4 boards, only one module can be switched to the BMC at a time. On v2.5, we replaced these muxes with a USB hub, which opens up the possibility of flashing multiple nodes simultaneously.

Describe the solution you'd like
We want to be able to write an image to a selection of nodes. The dropdown in the "flash" tab of UI gets replaced with checkboxes, which the user can use to select which nodes to flash simultaneously. Flashing different images concurrently to different nodes is out of scope. Keep it simple!

  • if an error occurs with one of the nodes, all other tasks are aborted as well.
  • error messages need to be altered so it's clear to the user which node caused the error.
  • All flashing features should be expanded to the other nodes as well. sha265 checking, skip crc bool and xz decompressing. Be mindful that we are extremely limited on memory. for instance, don't decompress the same OS image multiple times.

additional information
we expect changes in the following 2 repos:

@barrenechea
Copy link
Contributor

I can prepare the UI so we're able to support this use case, I'll be playing with options there 😄

@barrenechea
Copy link
Contributor

barrenechea commented May 13, 2024

To know, the UI should behave differently depending on if the board is <= v2.4 (or >=v2.5). Am I right? Could I get the board revision to render options conditionally? I think a good endpoint would be the one currently providing data for the About tab

I think that a good option would be for v2.4 users only to be able to pick one option (and automatically disable the user from picking more than one choice), and if the board is >=2.5, for it to not have that "disabled after one". That way, the experience would be similar for all users, and v2.5 boards could pick many nodes.

@svenrademakers
Copy link
Collaborator Author

You brought up a good point. Of course, this behavior should only occur when a 2.5+ board is detected. You're also right that we need an endpoint to detect which of the 2 versions needs to be loaded. I would prefer to have a field encoded in the actual flashing endpoint that specifies something like:

{
 can_do_bulk_flashing: true
 ...
 }

Making the code dependent on the firmware version is a less clean option as we make ourselves dependent on this specific hardware when in theory, it doesn't matter on which hardware it runs.

I think that a good option would be for v2.4 users only to be able to pick one option (and automatically disable the user from picking more than one choice),

that sounds good to me. it will keep things consistent!

@barrenechea
Copy link
Contributor

I wonder if there is a chance to make the multiple selection of nodes work on 2.4... It may not be possible to flash them all simultaneously, but if we could flash them in sequence, the UI would work for both boards (just that v2.5 would be up to four times faster).

I could do a workaround on the frontend (to "send" flashing requests in sequence after one finishes), but if the backend could handle it, we could handle all the flashing sequence with a single image upload.

@MPC-GH
Copy link

MPC-GH commented May 15, 2024

The BMC itself doesn't have a lot of storage or ram, so you would be reliant on there being an SD Card of sufficient size in place if you were sequential flashing without re-streaming the image over the network repeatedly. Seems complex to do nicely in the web interface.

Would we want to consider caching before flashing anyway if there's a suitably large SD card in place from a reliability perspective? I can certainly see some use cases (remote or hard to physically access setups) where you may not want to risk a network drop mid-flash. For my use cases, I probably wouldn't be using the GUI at that point if I'm honest, but a locally saved image and the command line tooling.

@barrenechea
Copy link
Contributor

@MPC-GH Yeah you're right, it probably streams the uploaded file directly to the target node(s). Better to keep it simple for now so we don't delay the main feature.

@svenrademakers a question regarding the /api/bmc?opt=set&type=flash call. Currently, it expects something like:
/api/bmc?opt=set&type=flash&file=ubuntu.img&length=55345150&node=0 (node being 0-indexed)

Would it make sense for this to send a comma-separated list in the node field for bulk flashing? Something like:
/api/bmc?opt=set&type=flash&file=ubuntu.img&length=55345150&node=0,1,2,3

@svenrademakers
Copy link
Collaborator Author

@barrenechea, I would like to keep the API backward compatible as much as possible. Therefore, it would be better if we introduced an additional key (it's not more elegant by all means). Maybe copy or batch is the right word?

e.g.
/api/bmc?opt=set&type=flash&file=ubuntu.img&length=55345150&node=0&batch=1,2,3

@barrenechea
Copy link
Contributor

e.g.
/api/bmc?opt=set&type=flash&file=ubuntu.img&length=55345150&node=0&batch=1,2,3

I like batch! I followed it to the teeth 😄 my draft PR is currently handling it with the following cases:

For all v2.4 boards (and v2.5 clicking a single node): node=0 (no batch parameter)
[v2.5 only] Nodes 1,2,3,4 clicked: node=0&batch=1,2,3
[v2.5 only] Nodes 1,3,4 clicked: node=0&batch=2,3
[v2.5 only] Nodes 2,4 clicked: node=1&batch=3

Note that I'm ordering the node IDs on the client, meaning:

If the user clicks first on Node 4 and second on Node 1, the payload will be:
node=0&batch=3

And not in the order the user clicked, like:
node=3&batch=0 <- This will not happen

It's just an Array.sort I'm doing before sending the request. If irrelevant, I could clean it up and save some CPU cycles on the front end 🤣

We'll see how it goes, but we have something to play with!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants