Allow vddk to be dynamically added without rebuilding container images #734

bdunne · 2021-08-09T22:32:18Z

The current VDDK installation in pods involves rebuilding the container images, but that is not convenient for users. As discussed, we need an easier solution.

~~Redistribute the VDDK~~

~~Need to be a Select level member of the VMware TAP program^[ref]~~

~~ConfigMap~~

~~Limited to 1MiB, file is ~20MiB~~

Keep the current solution of rebuilding the images

High maintenance for upgrades

Single PersistentVolume mounted to all pods

limits the backing storage types that we can run on
More things to worry about for backup and restore
Convenient that we only need to upload to a single place

Multiple PersistentVolumes (one per worker pod)

Orchestrator will have to set them up copy file(s) in
More things to worry about for backup and restore
Don't need to worry about storage types, but requires the cluster to have multiple volumes available

BinaryBlobs

Allows a single place to upload files
Don't need to worry about backup / restore
Don't need to worry about available volumes or their backing storage
May be tricky to coordinate the initial upload and restarting of workers to make use of the new blob
Every pod start would require a download and install

Download on demand

Allow the user to specify a location to download the VDDK, like an FTP server or something
User would have to permanently store the VDDK somwhere.
Adding on the fly would require a worker restart on podified
Every pod start would require a download and install

Thoughts @Fryguy & @agrare ?

volume access modes

Fryguy · 2021-08-10T14:14:10Z

I'd also like @jrafanie and @chessbyte to add their thoughts here.

ConfigMap is right out, because of the size limitations, so I crossed it out in the OP.

I'd prefer whatever solution we use to be consistent across both appliances and podified, so that we can reduce any speciality. That's another upside to BinaryBlobs, because we can document it as the preferred way for both appliances and podified.

Slight downside to BinaryBlobs is we will need to have a specific UI for uploading the file, which hopefully we can do in a pluggable way. We probably should block that to super-admins only or introduce a new RBAC setting or something for that.

Fryguy · 2021-08-10T14:15:24Z

In the private discussion, @agrare mentioned that we really only need the VDDK on the operation worker, which, I think, there can only be one replica of, however there is one per VMware EMS instance.

bdunne · 2021-08-10T14:35:50Z

Another advantage for BinaryBlobs option is that the admin doesn't have to install the package on every appliance that could run a Vmware operations worker, single upload and they're done. This could also simplify updates to the package.

Slight downside to BinaryBlobs is we will need to have a specific UI for uploading the file, which hopefully we can do in a pluggable way. We probably should block that to super-admins only or introduce a new RBAC setting or something for that.

Yeah, I was thinking a simple HTTP PUT to the API, but UI would be nice too.
I agree, it should be super-admin only

Fryguy · 2021-08-10T14:37:03Z

If we do BinaryBlobs, we have to make sure that the blob record is attached to something or it will get purged...the docs would have to tell the user to attach it to the enterprise or something.

Fryguy · 2021-08-10T14:39:43Z

HTTP PUT is a cool idea. I'm concerned they may try to upload multiple and we'd have to either disallow that or figure out the "right" one. Perhaps, instead of directly on a /binary_blobs endpoint, we could have an API on the EMS or perhaps the Enterprise? That API could do the right thing under the hood.

We'll probably need an API for the UI anyway, so that's a good start.

chessbyte · 2021-08-10T14:42:27Z

If we do BinaryBlobs, we have to make sure that the blob record is attached to something or it will get purged...the docs would have to tell the user to attach it to the enterprise or something.

Why not attach it to the VMware Provider EMS?

Fryguy · 2021-08-10T14:43:16Z

Why not attach it to the VMware Provider EMS?

The user would be required to upload it multiple times per VMware EMS instance, if they have multiple VMwares.

agrare · 2021-08-10T14:43:20Z

Why not attach it to the VMware Provider EMS?

Because we only need one instance of it

Fryguy · 2021-08-10T14:46:43Z

Just thought of another option, which I believe is along the lines of what the v2v team did.

Allow the user to specify a location to download the VDDK, like an FTP server or something. This could be set up as an extra tab in the vmware DDF form adding an endpoint/authentication for the vddk location. Then, when the operations worker starts, if there's a vddk defined, it can download and install it in the right place.

agrare · 2021-08-10T14:49:54Z

If we have the tgz file in BinaryBlobs then we will have to pull it down and extract it to the target dir. On an appliance this will only have to be done once but on podified it will have to be done everytime the pod starts up. Not a dealbreaker but not ideal

Fryguy · 2021-08-10T14:50:50Z

@agrare Same would be true for the "download the VDDK" option.

agrare · 2021-08-10T14:51:09Z

Yes I was comparing to the "mount a volume' option (unless that's already been NAK'd?)

Fryguy · 2021-08-10T14:51:31Z

On an appliance this will only have to be done once

On an appliance this will only have to be done once... per server

Fryguy · 2021-08-10T14:52:20Z

No, we haven't NAK'd yet - still trying to capture all the pros/cons and also new ideas.

Fryguy · 2021-08-10T14:55:13Z

Also we should technically add https://code.vmware.com/docs/6635/virtual-disk-development-kit-programming-guide/doc/GUID-282C600E-D986-4592-9206-70BF60DBF684.html to the OP to show we've considered it 😆

Fryguy · 2021-08-10T15:02:09Z

Added the download on demand and the redistribute options to the OP and also added @agrare's concern about the "on every pod startup"

jrafanie · 2021-08-10T15:03:08Z

What's required to tell the underlying code where the vddk is installed if we're not rebuilding images? I understand that uploading the files to the right location is important but I'd imagine most of our usages will be people having pods run with the installed vddk wherever it's located so runtime steps we need to take is pretty important too.

chessbyte · 2021-08-10T15:03:18Z

Allow the user to specify a location to download the VDDK, like an FTP server or something. This could be set up as an extra tab in the vmware DDF form adding an endpoint/authentication for the vddk location. Then, when the operations worker starts, if there's a vddk defined, it can download and install it in the right place.

The issue I see with this approach is that we will eventually have to support every type of file sharing under the sun (FTP, SFTP, Cloud Storage, ...). The BinaryBlob approach offers a single way to do it

chessbyte · 2021-08-10T15:05:35Z

@jrafanie @agrare how challenging is the install on first use of the Operations Worker?

@Fryguy with that approach, are we going to have security issues of what files are writable and where?

agrare · 2021-08-10T15:05:48Z

What's required to tell the underlying code where the vddk is installed if we're not rebuilding images? I understand that uploading the files to the right location is important but I'd imagine most of our usages will be people having pods run with the installed vddk wherever it's located so runtime steps we need to take is pretty important too.

We can set LD_LIBRARY_PATH to the target directory even if the it doesn't exist. That way when we extract the files there the env should already be set up.

I should note that I haven't gotten this working on an appliance yet, LD_LIBRARY_PATH doesn't seem to be honored or something.

Fryguy · 2021-08-10T15:11:53Z

I recall there being hacks that Jerry worked on that might be overriding the LD_LIBRARY_PATH

https://github.com/ManageIQ/vmware_web_service/blob/3273f463d7090f734ea14657d4565807b53e3209/lib/VMwareWebService/VixDiskLib/VixDiskLib.rb#L101

agrare · 2021-08-10T15:16:21Z

That looks like it just tacks on a specific directory to LD_LIBRARY_PATH, in my case I already have the proper directory in LD_LIBRARY_PATH

bdunne · 2021-08-10T15:18:02Z

If we have the tgz file in BinaryBlobs then we will have to pull it down and extract it to the target dir. On an appliance this will only have to be done once but on podified it will have to be done everytime the pod starts up. Not a dealbreaker but not ideal

I think this could be an advantage. Yes, it will slow down boot time slightly, but it's super easy to update the package in the future. Just replace the BinaryBlob with the newer version and the next time a pod starts it gets the latest. I was thinking that appliances could do the same at appliance boot.

bdunne · 2021-08-10T15:20:48Z

Allow the user to specify a location to download the VDDK, like an FTP server or something. This could be set up as an extra tab in the vmware DDF form adding an endpoint/authentication for the vddk location. Then, when the operations worker starts, if there's a vddk defined, it can download and install it in the right place.

The issue I see with this approach is that we will eventually have to support every type of file sharing under the sun (FTP, SFTP, Cloud Storage, ...). The BinaryBlob approach offers a single way to do it

I agree, and we're trying to drop similar things (like backup and restore to/from all of these locations)

agrare · 2021-08-10T15:28:22Z

Ah sorry, use of the VDDK is by the smartproxy not the vmware operations worker. I had originally had the entire smartstate scan performed by the vmware operations worker but changed it prior to merge to maintain the same performance of having multiple smart proxies.

Fryguy · 2021-08-10T15:59:30Z

@agrare I think I'm thinking of this PR (from 2014! ManageIQ/manageiq#277)

Fryguy · 2021-08-10T16:00:13Z

I think this could be an advantage. Yes, it will slow down boot time slightly, but it's super easy to update the package in the future. Just replace the BinaryBlob with the newer version and the next time a pod starts it gets the latest. I was thinking that appliances could do the same at appliance boot.

...on podified. On appliances that will be complicated as we will need to compare what's on disk with what's in the database.

agrare · 2021-08-10T16:15:44Z

Okay I had vddk 7.0.2 which was "too new" so ffi-vix_disk_lib wasn't picking it up. Once I switched to 7.0.1 it worked.

Also need to note: the target directory doesn't have to exist for the worker to start up, but if it doesn't exist at startup then it is created after then it isn't picked up by ld. If the target directory does exist but is empty when the process starts up, then the .so is created after, then ld does pick it up. This just means whatever we want to use as the target directory has to exist on worker initialization even if it is empty if we got the BinaryBlob extract on startup approach.

bdunne · 2021-08-10T16:50:20Z

We could also design it generically enough with BinaryBlobs to allow us to upload code patches as well (solving another podified problem).

bdunne · 2021-08-10T16:52:31Z

I think this could be an advantage. Yes, it will slow down boot time slightly, but it's super easy to update the package in the future. Just replace the BinaryBlob with the newer version and the next time a pod starts it gets the latest. I was thinking that appliances could do the same at appliance boot.

...on podified. On appliances that will be complicated as we will need to compare what's on disk with what's in the database.

I was thinking we would overwrite it on every appliance boot too.

Fryguy · 2021-08-17T18:33:13Z

@jrafanie This is where we are discussing VDDK, but it overlaps a little with the ability to provide hotfixes. That is, whatever we come up with here we might be able to piggy back on for hotfixes and other diagnostics.

Fryguy · 2021-08-17T18:38:31Z

Either that or maybe we should split off a separate discussion for hotfixes / patches, but I was hoping what we come up with for the VDDK was generic enough for both

jrafanie · 2021-08-17T18:45:35Z

We could also design it generically enough with BinaryBlobs to allow us to upload code patches as well (solving another podified problem).

Are you thinking of hooking into the entrypoint like we generate the key pre-run single worker? I don't know if we could make changes to run_single_worker pre-rails to get and extract the binary blob. We'd have to do similar work on the appliance side to pre-run single worker to have it do the same thing ONCE per appliance at boot.

jrafanie · 2021-08-17T18:50:37Z

Note, we don't call run_single_worker in orchestrator or the other non-orchestrator managed pods so if we wanted to deliver a hotfix for the orchestrator, it can't be done through that mechanism. We'd need something called from the entrypoint for each pod and once at boot for appliances.

Fryguy · 2021-08-17T19:17:06Z

@jrafanie Good point. I was actually questioning that part myself. While this might all work for VDDK (because the vmware worker could do this work), if we generalized for any file, we'd need a way to query the database, but doing so before starting the run_single_worker, and thus before we have the Rails environment. I was thinking a stub executable / Ruby method who's only job is to query that table and pull the values.

This was actually why I started gravitating towards the "just give us a ftp/http endpoint and we can curl it" idea, because it would work in this pre-boot environment.

jrafanie · 2021-08-17T19:38:11Z

if we generalized for any file, we'd need a way to query the database, but doing so before starting the run_single_worker, and thus before we have the Rails environment

Right, so we'd have to weigh the cost of loading a bloated rails process to rebuild the blob from parts to dump to a file or make it pure ruby or shell or etc. It's likely the most useful places we'd want to hotfix will be pods where we have rails available to us (normal workers, orchestrator) vs. others such as httpd/memcached/postgresql which can't load rails.

Or we can get it from some other endpoint but then we might want to keep the FileDepot classes. We'd then have to worry about what locations we support and oddities with connectivity problems.

🤷

It feels like binary blobs using straight rails is fairly easy to POC to see if we can hotfix some code or add vddk or something.

We could then decide to try going straight ruby to see if we can make it more performant. I don't see a need to hotfix httpd, memcached, postgresql pods.

We could then do the same thing with appliances at boot.

bdunne · 2021-08-17T21:00:33Z

We could also design it generically enough with BinaryBlobs to allow us to upload code patches as well (solving another podified problem).

Are you thinking of hooking into the entrypoint like we generate the key pre-run single worker? I don't know if we could make changes to run_single_worker pre-rails to get and extract the binary blob. We'd have to do similar work on the appliance side to pre-run single worker to have it do the same thing ONCE per appliance at boot.

Yes, my thought was do do it in the entrypoint before we launch the rails worker.

bdunne · 2023-02-16T21:15:38Z

We can't ldconfig on boot in kubernetes, so we may not be able to do anything more than has already been documented in ManageIQ/manageiq-documentation#1706

sh-4.4$ ldconfig
ldconfig: Can't create temporary cache file /etc/ld.so.cache~: Permission denied

miq-bot · 2023-05-22T00:00:05Z

This issue has been automatically marked as stale because it has not been updated for at least 3 months.

If you can still reproduce this issue on the current release or on master, please reply with all of the information you have about it in order to keep the issue open.

Thank you for all your contributions! More information about the ManageIQ triage process can be found in the triage process documentation.

bdunne added the enhancement label Aug 9, 2021

bdunne self-assigned this Aug 9, 2021

miq-bot added the stale label May 22, 2023

bdunne added pinned and removed stale labels Jul 12, 2023

Allow vddk to be dynamically added without rebuilding container images #734

Allow vddk to be dynamically added without rebuilding container images #734

Comments

bdunne commented Aug 9, 2021 • edited Loading

Fryguy commented Aug 10, 2021

Fryguy commented Aug 10, 2021

bdunne commented Aug 10, 2021

Fryguy commented Aug 10, 2021

Fryguy commented Aug 10, 2021

chessbyte commented Aug 10, 2021

Fryguy commented Aug 10, 2021

agrare commented Aug 10, 2021

Fryguy commented Aug 10, 2021 • edited Loading

agrare commented Aug 10, 2021

Fryguy commented Aug 10, 2021

agrare commented Aug 10, 2021 • edited Loading

Fryguy commented Aug 10, 2021

Fryguy commented Aug 10, 2021

Fryguy commented Aug 10, 2021

Fryguy commented Aug 10, 2021

jrafanie commented Aug 10, 2021

chessbyte commented Aug 10, 2021

chessbyte commented Aug 10, 2021

agrare commented Aug 10, 2021

Fryguy commented Aug 10, 2021 • edited Loading

agrare commented Aug 10, 2021

bdunne commented Aug 10, 2021

bdunne commented Aug 10, 2021

agrare commented Aug 10, 2021

Fryguy commented Aug 10, 2021

Fryguy commented Aug 10, 2021

agrare commented Aug 10, 2021

bdunne commented Aug 10, 2021

bdunne commented Aug 10, 2021

Fryguy commented Aug 17, 2021

Fryguy commented Aug 17, 2021

jrafanie commented Aug 17, 2021 • edited Loading

jrafanie commented Aug 17, 2021

Fryguy commented Aug 17, 2021

jrafanie commented Aug 17, 2021

bdunne commented Aug 17, 2021

bdunne commented Feb 16, 2023

miq-bot commented May 22, 2023

bdunne commented Aug 9, 2021 •

edited

Loading

Fryguy commented Aug 10, 2021 •

edited

Loading

agrare commented Aug 10, 2021 •

edited

Loading

Fryguy commented Aug 10, 2021 •

edited

Loading

jrafanie commented Aug 17, 2021 •

edited

Loading