Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

🌱 Be able to specify multiple installation target disk #391

Open
mudler opened this issue Nov 7, 2022 · 10 comments
Open

🌱 Be able to specify multiple installation target disk #391

mudler opened this issue Nov 7, 2022 · 10 comments
Assignees

Comments

@mudler
Copy link
Member

mudler commented Nov 7, 2022

Is your feature request related to a problem? Please describe.
A system might have a different mappings of device names, depending on the HW

Describe the solution you'd like
A way to give the installer a list of devices, try that out and find the one available and install it from there.
For instance:

install:
  device_list:
  - /dev/sda
  - /dev/vda

First match wins, and the first found becomes the install target

Describe alternatives you've considered

Additional context

@mudler mudler added the enhancement New feature or request label Nov 7, 2022
@mudler mudler assigned mudler and unassigned mudler Nov 7, 2022
@mudler mudler added the lane/ux label Nov 7, 2022
@mudler mudler moved this to Todo 🖊 in 🧙Issue tracking board Nov 7, 2022
@3pings
Copy link
Contributor

3pings commented Jan 10, 2023

What is the status on this being picked up?

@jimmykarily
Copy link
Contributor

I find this feature a bit strange. Apparently, the idea is to use the same config on different hardware, otherwise we could simply use the correct device name directly.

Given there is the "auto" option that selects the largest disk, I assume this feature is needed when "auto" won't do the right thing. For example, if a smaller disk should be used.

And this is what I find strange. We will be using the config on more than one machines with no predicable device names (first point above) but yet, we will know of a specific list of device names that will guarantee that the largest disk won't be selected (point 2 above).

Maybe this is true in some rare cases but I don't see this being a generally useful feature. Am I missing something?

@3pings
Copy link
Contributor

3pings commented Feb 1, 2023

It is common for fleet devices to be identical and ordered in bulk from manufacturers. It is stated the largest disk is used which I find interesting as typically the OS level disk is smaller and the larger disk is used for storage. Additionally, users may deploy 2 different types of nodes 1 for CP and one for workers that have different disks (CP may have an OS disk of /dev/nvme4n1, whereas workers might be /dev/nvme1n1) These would be all identical across thousands of nodes. I want to create an image for my devices, not manage a bunch of images. The use case for this type of list is pretty straightforward. I want to use a common image across different node types and specify the device I want to install. I do not want to leave it up to "auto" which is unpredictable.

@jimmykarily
Copy link
Contributor

jimmykarily commented Feb 6, 2023

We discussed this in the sprint planning and we think the use case if perfectly valid but the solution should be more generic. E.g. what happens if the user wants to select the smallest disk always and there are machines that have a smaller vda and some others a smaller vdb? Or any other logic? One idea was that we could allow the user to implement at "hook" which returns the device on which kairos should install. Pretty much like how kcrypt calls out to kcrypt-challenger (or kairos calls kcrypt).

What about other cases with more complex disk schemas. E.g. would we ever want to allow the user to create some partitions on one disk and some others on another one? We should better think about it now before we implement a solution.

@mudler thoughts?

@mudler
Copy link
Member Author

mudler commented Feb 13, 2023

We discussed this in the sprint planning and we think the use case if perfectly valid but the solution should be more generic. E.g. what happens if the user wants to select the smallest disk always and there are machines that have a smaller vda and some others a smaller vdb? Or any other logic? One idea was that we could allow the user to implement at "hook" which returns the device on which kairos should install. Pretty much like how kcrypt calls out to kcrypt-challenger (or kairos calls kcrypt).

I think in that case is perfectly valid to specify the expected device for installation. Especially if we are talking about bulk hosts - the HW layout should be the same across machines.

What about other cases with more complex disk schemas. E.g. would we ever want to allow the user to create some partitions on one disk and some others on another one? We should better think about it now before we implement a solution.

@mudler thoughts?

Maybe we can just support a regex matching the device name. For instance - if you expect NVMe's , it's safe to assume the device is /dev/nvm*, and so on so forth. Bashing out for user options might be tricky, especially to validate, and then it wouldn't be clear what can be called or not in order to identify partitions (be too much generic can be an overkill here)

@jimmykarily
Copy link
Contributor

The installer can already skip partitioning altogether. It's also possible to run arbitrary commands using cloud-init. This allows someone to do custom partitioning (and labeling). If that works, we simply have to document how this works. Let's do a spike on this and write down docs. Then we decide if we need something more.

@mudler mudler changed the title Be able to specify multiple installation target disk 🌱 Be able to specify multiple installation target disk Feb 27, 2023
@oz123 oz123 self-assigned this Feb 27, 2023
@mudler mudler unassigned oz123 Mar 2, 2023
@jimmykarily jimmykarily moved this from Todo 🖊 to In Progress 🏃 in 🧙Issue tracking board Mar 2, 2023
@jimmykarily jimmykarily self-assigned this Mar 2, 2023
@jimmykarily
Copy link
Contributor

Here is my test config.yaml for reference:

#cloud-config

users:
- name: "kairos"
  passwd: "kairos"

# Tell elemental to not create partitions
options:
  no-format: "true"

# User has to copy this file inside /oem
# because elemental will try to find stage steps there. Won't repect it here.
# Then the user has to install with
# `kairos-agent manual-install` with this file.
# Netbooting doesn't work for the same reason (elemental ignores the stages).
stages:
  before-install:
  - name: "Create a file"
    commands:
      - |
        touch /tmp/me-now
  boot:
  # TODO: check which device exists
  - if:  '[ -e /dev/vda ] && (kairos-agent state get boot | grep -q unknown)'
    name: "Create partitions"
    commands:
      - |
        touch /tmp/me-now

    layout:
      device:
        path: /dev/vda
      add_partitions:
        - fsLabel: COS_STATE
          size: 16240 # At least 16gb
          pLabel: state

As described in the comments, the above plan requires jumping through some hoops to make it work, because elemental doesn't respect the stages passed to the kairos-agent. We are thinking solutions to make this simpler.

@jimmykarily
Copy link
Contributor

jimmykarily commented Mar 6, 2023

Use cases:

Possible solutions:

  • Yaml based DSL to describe all possible configurations
  • Support an external script and skip partitioning altogether (maybe as a cloud-init script)
  • Different solution per use case above. E.g. implement an easy way define device names with priorities to solve 391 but it doesn't
    solve any of the other issues. Find solutions for the others.

Although we can implement simpler solutions for some of the issues, the one feature that solves them all is the fully custom partitioning.
Let's do that first and decide if we need simpler solutions for some of the rest.

We'll implement custom partitioning on #209 and keep this open (and blocked) to decide if we prefer a better solution for this use case only.

@jimmykarily jimmykarily moved this from In Progress 🏃 to Todo 🖊 in 🧙Issue tracking board Mar 6, 2023
@mudler
Copy link
Member Author

mudler commented Mar 6, 2023

After a sync call we agreed:

We were blocked because we were trying to netboot, and the config_url field took no effect on stages defined in that file. Elemental *-install stages needs to be executed ALSO in the file that is provided by the user during the installation (currently only files in /oem, /usr/local/cloud-config are respected).

Solution:
#209
In order to read stages that kicks in before the installer like boot. config_url needs to be downloaded and saved into /oem (or in the path scanned by elemental) (see

)
- download it, save it, say to /oem/ [LiveCD and netboot]
- chainload with forloop
if c.HasConfigURL() {
(?)

Trying locally:
First try to put the config we had and try with a datasource iso. If that works we have to make it available into /oem as the datasource does.

Let's keep this issue as a tracker, until we fix all the pieces to get there.

@jimmykarily
Copy link
Contributor

For reference, skipping formatting of the disk was broken for a while: #2281

Added a test on ensure we don't break it again: #2291

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: No status
Development

No branches or pull requests

4 participants