Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Memballoon - possible cause for slow shutdown. "deflation" issue #1148

Open
DaLiV opened this issue Sep 14, 2024 · 5 comments
Open

Memballoon - possible cause for slow shutdown. "deflation" issue #1148

DaLiV opened this issue Sep 14, 2024 · 5 comments
Assignees
Labels

Comments

@DaLiV
Copy link

DaLiV commented Sep 14, 2024

Describe the bug
at/after shutdown of guest os when use less than :

  • cpu usage on 100% and lasts long (some minutes /approx 3min / before qemu process of this VM ends)

problem occurs not (shutdown of VM takes some seconds) in cases when:
1 <currentMemory> equal to <memory>
2 memballoon driver disabled in windows
3 memballoon disabled on libvirt with "model=none"
4. prior to shutdown do "full memory allocation" to VM
virsh setmem VMName --live --size 32G
sure - that op also takes time - approx 90sec.
but direct shutdown "without that" takes 180sec.
that means 2 times less, shutdown done afterwards in "below 5 seconds"

seems memballoon allocate memory at shutdown when deallocation processes running,

To Reproduce

  1. set in vm.xml ...
<domain type='kvm'>
  <memory unit='KiB'>33554432</memory>
  <currentMemory unit='KiB'>4194304</currentMemory>
  <memoryBacking>
    <source type='memfd'/>
    <access mode='shared'/>
  </memoryBacking>
  1. start VM
  2. Stop VM

Expected behavior
Shutdown must not take long time.

Host:

  • Disto: Fedora 40
  • Kernel 6.10.8-200.fc40.x86_64
  • QEMU emu-8.2.6-3.fc40.x86_64
  • libvirt libvirt-10.1.0-4.fc40.x86_64
  • libvirt XML file
  <memory unit='KiB'>33554432</memory>
  <currentMemory unit='KiB'>4194304</currentMemory>

VM:

  • Windows 11
  • memballoon
  • 100.95.104.26200 / virtio-win-0.1.262.iso

Additional context
can monitor next at shutdown time:
watch -n 1 "virsh dommemstat VMName"
there grows "rss" till MaxMem, but very slowly

@YanVugenfirer
Copy link
Collaborator

  1. prior to shutdown do "full memory allocation" to VM
    Do you mean to infalte the balloon to the full memory of the VM? This is not recommended action in any case that might lead to system failure.

@DaLiV
Copy link
Author

DaLiV commented Sep 15, 2024

4 - i showed how that was done
virsh setmem VMName --live --size 32G
that allocate from partial usage "4Gb" dynamic to full 32G what is defined for this VM. standart defined command, standart "not-recomended" behaviour.
That test firstly is done for more cleraly understand where is possible fault persist.
all of ways 1/2/3/4 shows the same direction - dynamic memory + it's ballooning ...
long shutdown time is first symptom.
timing from (4) lead to think that half of this time is used for same "full memory allocation" with parallel "dealloction" at shutdown,
yes - that can prevent from OOMs, but at cost of "time" , which by many VMs will be multiplied ...

simple example for upgrade of host:
so - you have running 10 such VMs with dynamic allocation and need shutdown all of them

  • that takes in parallel "balloonning+▲/-▼" 40 Minutes (with cpu usage 100% on ALL cores),
  • but in case you has scripted shutdown with such sequential "reprovisioning" before shutting - that takes only 15 minutes and core loads also only "partial" as every VM is provisioned for usage only "3/4" of all host cores... gain 25 minutes to "uptime"
  • +you can shutdown them in own order of VM-importance
  • but that is currently "crutches"/"dirty hack for problem"
  • if you have more that enough memory to provision full memory for each VM - can suppose parallel shutdown takes overall below "30 sec" with "no-high-cpu-load"

P.S. swap usage=0, nothing goes there in case of some will say "you swapping at this time" what is not "recommended"
P.P.S even if "slow dymanic memory allocation" can be improved from "90 sec" to some "5sec" that will be also usable (then "shutdown-crutch-script" may be used as permanent "solution", as that has "VM-Importance-order")
P.P.P.S in case of dynamic underprovisioning to "2Gb" cpu usage also constantly "high" - that is additional point in this subsystem (but possible also related to the same part of code) .

@xiagao
Copy link

xiagao commented Sep 23, 2024

@DaLiV Win11 guest indeed has this problem with balloon device, it took almost twice as long as WS2022 guest. There is already a jira issue recorded internally. If there is any update, we'll update here.

@boennhoff
Copy link

boennhoff commented Nov 12, 2024

I am still seeing this issue with a win11 guest and virtio-win-0.1.262.iso. During shutdown of the VM, the CPU is 100% clogged and the host RAM raises very slowly to the full amount assigned to the VM (22GiB here). The memory/balloon inflation process alone takes on average about 4 minutes... My workaround is to set the minimum RAM usage to the assigned maximum, and the problem is gone, and all memory is always used.

Any news on the internal JIRA ticket @xiagao @YanVugenfirer?

@YanVugenfirer
Copy link
Collaborator

@boennhoff sorry. not yet.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

4 participants