Use as SWAP #3

snshn · 2014-12-14T16:50:47Z

Was wondering if it could be possible to host a swap partition within vramfs or somehow patch vramfs to make it work as a swap partition?

My drive is encrypted, therefore I don't use SWAP partitions... but if this thing could give me 3GB or so of a swap-like fs, we could be onto something...

Do you think it could work without fuse, natively?

Oh, and great idea behind vramfs, really neat!

Overv · 2014-12-14T18:50:58Z

It's possible to implement a block device with OpenCL backing it. It could probably be developed pretty quickly with something like BUSE.

ptman · 2014-12-14T19:50:15Z

If you can provide a block device, then you can also build RAID-0 on top of the block devices.

Overv · 2014-12-14T20:39:29Z

@ptman That is a great point. I'm going to look into writing a kernel module to do this tomorrow. I've tried BUSE, but it seems to be bottlenecking because it's based on the network block device interface.

snshn · 2014-12-14T20:56:30Z

A kernel module and a some kind of analogue to swapon/swapoff would make this thing look very serious.

Both FUSE and BUSE would definitely only slow things down.

Good luck @Overv, thanks for sharing!

Overv · 2014-12-15T21:37:53Z

I've done some preliminary testing with BUSE and trivial OpenCL code. The read speed is 1.1 GB/s and the write speed 1.5 GB/s with ext4. Writing my own kernel module is going to take more time, and it'll still require a userspace daemon to interact with OpenCL.

snshn · 2014-12-15T21:49:15Z

Wow, very good news, @Overv!

I think the daemon is necessary just to provide the proper RAID support across multiple vramfs-based block devices and to control the amount of memory dedicated per adapter... I believe a package named vramfs-tools containing vramfsd and vramfsctl could fit the purpose...

Wondering what @torvalds will think of this project, maybe it'll end up being included in the tree like tmpfs...
4GB of VRAM on my Linux laptop feels like such a waste... bet I'm not the only one who feels that way.

Thanks for your work, once again!

agrover · 2014-12-15T22:02:34Z

if you want a userspace-backed block (SCSI) device I would encourage you to look at TCMU, which was just added to Linux 3.18. It's part of the LIO kernel target. Using it along with the loopback fabric and https://github.com/agrover/tcmu-runner may fill in some missing pieces. tcmu-runner handles the "you need a daemon" part so the work would just consist of a vram-backed plugin for servicing SCSI commands like READ and WRITE. Then you'd have the basic block device, for swap or a filesystem or whatever.

(tcmu-runner is still alpha but I think it would save you from writing kernel code and a daemon from scratch. feedback welcome.)

bisqwit · 2020-01-04T10:54:52Z

While it is technically possible to create a file on VRAMFS and use it as a swap, this is risky: What happens if VRAMFS itself, or one of the GPU libraries, gets swapped? This can happen in a low-mem situation, i.e exactly in a situation that swap is designed to help. The kernel cannot possibly know that restoring data from the swap depends on the data that is… swapped in the swap.
This is not an issue for kernel-space filesystem/storage drivers because the kernel’s own RAM never gets swapped, but it is a conundrum for user-space stuff.

j123b567 · 2020-01-21T11:38:22Z

For kernel-space driver, it would be nice to use directly TTM/GEM to allocate video ram buffers.

bisqwit · 2020-01-21T12:09:08Z

What are TTM/GEM?

Note that the slram/phram/mtdblock thing can only access at most like 256 MB of the memory, the size of the memory window (I guess) of the PCI device.

j123b567 · 2020-01-21T13:35:30Z

I don't know much, but they are some interfaces to acces GPU memory inside kernel. So it can see all the GPU memory, not only some mapped part directly accesible. https://www.kernel.org/doc/html/latest/gpu/drm-mm.html

My situation, NVidia dedicated GPU with 4GB RAM and nouveau driver without OpenCL support. This memory is not mapped to memory space so I can't use them using slram/phram.

dhalsimax · 2020-10-06T20:01:18Z

It's possible to implement a block device with OpenCL backing it. It could probably be developed pretty quickly with something like BUSE.

The easy way to accomplish is to use vmrafs as is, make a file on vramfs disk then use a loop device on that file, format the loop device with mkswap and then swapon. With this method everything seems to work as I tried. Anyway the big issue using FUSE or BUSE is that both runs in user space and user space is swappable. I have not tried it, but suppose the memory of the vramfs process get swapped itself by the kernel, how would the kernel be able to recover by a page fault as it needs to reload in the first place ? I am curious what will happen then?

Edit: sorry I was not reading the comments before as bisqwit already explained...anyway I've tried to use as swap after a while got system freezing need a hard reboot (switch off and on power sob)...

LHLaurini · 2020-11-16T17:34:10Z

What happens if VRAMFS itself, or one of the GPU libraries, gets swapped?

Couldn't mlockall be used to prevent vramfs from getting swapped?

montvid · 2020-12-15T00:45:31Z

Wonderful idea! I am runnnig an old headless server with a 1 gb ddr3 amd card opencl 1.1. I can use all the video ram as i use just ssh. Unfortunately vramfs does not let me create a swap file based swap I get "swapon: /mnt/vram/swapfile: swapon failed: Invalid argument". Can it be fixed? I see opencl 1.2 is merged into mesa 20.3 so good times ahead for this project.

wonghang · 2021-01-20T09:47:42Z

It doesn't work for me. Even I tried to mlockall() page for the userspace program. I think the nvidia driver allocated some memory that would be swapped. At some point, the computer will get into deadlock when memory is low.

I also tried the BUSE / nbd approach. It doesn't work for me as well.

I think we need to get into the nvidia driver, carefully develop a block device kernel driver and call these undocumented API:

cat /proc/kallsyms |grep rm_gpu_ops | sort -k 3
0000000000000000 t rm_gpu_ops_address_space_create	[nvidia]
0000000000000000 t rm_gpu_ops_address_space_destroy	[nvidia]
0000000000000000 t rm_gpu_ops_bind_channel_resources	[nvidia]
0000000000000000 t rm_gpu_ops_channel_allocate	[nvidia]
0000000000000000 t rm_gpu_ops_channel_destroy	[nvidia]
0000000000000000 t rm_gpu_ops_create_session	[nvidia]
0000000000000000 t rm_gpu_ops_destroy_access_cntr_info	[nvidia]
0000000000000000 t rm_gpu_ops_destroy_fault_info	[nvidia]
0000000000000000 t rm_gpu_ops_destroy_session	[nvidia]
0000000000000000 t rm_gpu_ops_device_create	[nvidia]
0000000000000000 t rm_gpu_ops_device_destroy	[nvidia]
0000000000000000 t rm_gpu_ops_disable_access_cntr	[nvidia]
0000000000000000 t rm_gpu_ops_dup_address_space	[nvidia]
0000000000000000 t rm_gpu_ops_dup_allocation	[nvidia]
0000000000000000 t rm_gpu_ops_dup_memory	[nvidia]
0000000000000000 t rm_gpu_ops_enable_access_cntr	[nvidia]
0000000000000000 t rm_gpu_ops_free_duped_handle	[nvidia]
0000000000000000 t rm_gpu_ops_get_channel_resource_ptes	[nvidia]
0000000000000000 t rm_gpu_ops_get_ecc_info	[nvidia]
0000000000000000 t rm_gpu_ops_get_external_alloc_ptes	[nvidia]
0000000000000000 t rm_gpu_ops_get_fb_info	[nvidia]
0000000000000000 t rm_gpu_ops_get_gpu_info	[nvidia]
0000000000000000 t rm_gpu_ops_get_non_replayable_faults	[nvidia]
0000000000000000 t rm_gpu_ops_get_p2p_caps	[nvidia]
0000000000000000 t rm_gpu_ops_get_pma_object	[nvidia]
0000000000000000 t rm_gpu_ops_has_pending_non_replayable_faults	[nvidia]
0000000000000000 t rm_gpu_ops_init_access_cntr_info	[nvidia]
0000000000000000 t rm_gpu_ops_init_fault_info	[nvidia]
0000000000000000 t rm_gpu_ops_memory_alloc_fb	[nvidia]
0000000000000000 t rm_gpu_ops_memory_alloc_sys	[nvidia]
0000000000000000 t rm_gpu_ops_memory_cpu_map	[nvidia]
0000000000000000 t rm_gpu_ops_memory_cpu_ummap	[nvidia]
0000000000000000 t rm_gpu_ops_memory_free	[nvidia]
0000000000000000 t rm_gpu_ops_own_page_fault_intr	[nvidia]
0000000000000000 t rm_gpu_ops_p2p_object_create	[nvidia]
0000000000000000 t rm_gpu_ops_p2p_object_destroy	[nvidia]
0000000000000000 t rm_gpu_ops_pma_alloc_pages	[nvidia]
0000000000000000 t rm_gpu_ops_pma_free_pages	[nvidia]
0000000000000000 t rm_gpu_ops_pma_pin_pages	[nvidia]
0000000000000000 t rm_gpu_ops_pma_register_callbacks	[nvidia]
0000000000000000 t rm_gpu_ops_pma_unpin_pages	[nvidia]
0000000000000000 t rm_gpu_ops_pma_unregister_callbacks	[nvidia]
0000000000000000 t rm_gpu_ops_query_caps	[nvidia]
0000000000000000 t rm_gpu_ops_query_ces_caps	[nvidia]
0000000000000000 t rm_gpu_ops_release_channel	[nvidia]
0000000000000000 t rm_gpu_ops_release_channel_resources	[nvidia]
0000000000000000 t rm_gpu_ops_report_non_replayable_fault	[nvidia]
0000000000000000 t rm_gpu_ops_retain_channel	[nvidia]
0000000000000000 t rm_gpu_ops_retain_channel_resources	[nvidia]
0000000000000000 t rm_gpu_ops_service_device_interrupts_rm	[nvidia]
0000000000000000 t rm_gpu_ops_set_page_directory	[nvidia]
0000000000000000 t rm_gpu_ops_stop_channel	[nvidia]
0000000000000000 t rm_gpu_ops_unset_page_directory	[nvidia]

to create a GPU session and allocate GPU memory in order to make a GPU swap truly possible.

azureblue · 2021-12-03T00:42:09Z

Hi guys, any update on this? Has anyone been able to reliably use VRAM as swap?

bisqwit · 2021-12-03T00:58:26Z

It only works if the following two conditions are met:

The GPU driver code/data is never put in swap
The vramfs driver code/data is never put in swap.
If you somehow can guarantee these aspects, then using VRAM as swap will work.

montvid · 2021-12-03T01:03:03Z

Did not work for me the one time I tried it. Seems the project is abandoned...

wonghang · 2021-12-03T01:16:05Z

fuse should be able not to swap itself. But I attempted to add mlockall() in vramfs code, it didn't work either. It appears that GPU driver (nvidia) and CUDA libraries was swapped.

In nvidia driver, there are some undocumented functions (prefix by rm_, run cat /proc/kallsyms | grep nvidia to see) to access GPU memory. I think they are parts of GPUDirect RDMA (https://docs.nvidia.com/cuda/gpudirect-rdma/index.html).
If we can somehow hack them and write a kernel driver to handle the paging, it may be possible to use GPU as swap.

Atrate · 2022-05-29T13:23:12Z

It is possible to achieve this, see https://wiki.archlinux.org/title/Swap_on_video_RAM , section FUSE.

The vramfs driver code/data is never put in swap.

This can be achieved with https://wiki.archlinux.org/title/Swap_on_video_RAM#Complete_system_freeze_under_high_memory_pressure

I tested it under high memory pressure (stress -m 10 --vm-bytes 3G --vm-hang 10 on a 32G system) and it didn't fall over, but only after applying the aforementioned fix.

bisqwit · 2022-05-29T14:24:50Z

This looks like a proper solution indeed.

Atrate · 2022-11-27T19:13:03Z

I've tried implementing mlockall. If you want to, you can test whether it works for you and fixes deadlocks without needing to use a systemd service.

#32

twobombs · 2023-01-15T12:09:07Z

I would like to add to this discussion that the addition of vramfs as a block device would help using vramfs as a dedicated L2ARC ZFS buffer.

We are using very big dedicated nvme swap raid arrays for quantum computing and need something that is faster then 8-16 NVME sticks in RAID to collect the IO in a buffer that is not in main memory.

We make use of a lot of (virtual) memory so an L2ARC buffer in vram would be awesome; the GPUs would get a new lease on life because we went to CPU only calculation because of huge memory requirements to store the eigen vector (think 8/16TB)

Atrate · 2023-01-15T18:50:22Z

I would like to add to this discussion that the addition of vramfs as a block device would help using vramfs as a dedicated L2ARC ZFS buffer.

We are using very big dedicated nvme swap raid arrays for quantum computing and need something that is faster then 8-16 NVME sticks in RAID to collect the IO in a buffer that is not in main memory.

We make use of a lot of (virtual) memory so an L2ARC buffer in vram would be awesome; the GPUs would get a new lease on life because we went to CPU only calculation because of huge memory requirements to store the eigen vector (think 8/16TB)

@twobombs

You can make a loop device with losetup but NVME RAID will probably be faster than vramswap, the performance is still somewhat lacking in certain areas.

twobombs · 2023-01-15T19:17:48Z

@Atrate thank you very much for the loop solution. will look into this and if ZFS will allow a loop device as cache. the swap I/O usage pattern is random read/write, not stream. a PCIe VRAM device might offer better speeds whilst at the same time making the workload on NVME raid devices more 'stream'-lined when changes are comitted to the array.

twobombs · 2023-02-03T12:04:26Z

I went a step further and added VRAM cache files for ZFS based SWAP.
It is fairly hilarious to see IO come through on NVTOP

aedalzotto · 2023-08-03T00:28:25Z

It is possible to achieve this, see https://wiki.archlinux.org/title/Swap_on_video_RAM , section FUSE.

The vramfs driver code/data is never put in swap.

This can be achieved with https://wiki.archlinux.org/title/Swap_on_video_RAM#Complete_system_freeze_under_high_memory_pressure

I tested it under high memory pressure (stress -m 10 --vm-bytes 3G --vm-hang 10 on a 32G system) and it didn't fall over, but only after applying the aforementioned fix.

The solution seems to work for me, but when I increase swappiness from 10 to 180, it simply freezes.
The same happens without increasing swappiness when running mprime.

I am running vramfs as a service, as the workaround cited above suggests. The only thing I think I am doing different is using a loopback, as my swapfile is being created with holes.

Does anyone have an idea of what is happening?

UPDATE:
I tracked the last boot journal, and it stated the following error:
kernel: [drm:amdgpu_dm_atomic_check [amdgpu]] ERROR [CRTC:82:crtc-0] hw_done or flip_done timed out

Atrate · 2023-08-12T14:07:53Z

In reply to: #3 (comment)

As suggested by Fanzhuyifan and others above I think that may be due to other GPU-management processes/libraries getting swapped out and maaaybe a fix is possible with a lot of systemd unit editing but that'd require tracking down every single library and process that is required for the operation of a dGPU and that seems like a chore.

fanzhuyifan · 2023-08-13T08:22:11Z

According to the documentation of mlockall,

mlockall() locks all pages mapped into the address space of the
calling process. This includes the pages of the code, data, and
stack segment, as well as shared libraries, user space kernel
data, shared memory, and memory-mapped files. All mapped pages
are guaranteed to be resident in RAM when the call returns
successfully; the pages are guaranteed to stay in RAM until later
unlocked.

So shared libraries directly used by vramfs being swapped out should not be the reason of system freezes.

Edit: Examining the resident size and virtual memory size of the vramfs process, I think the issue is that vramfs asks for additional memory to serve reads/writes.

Atrate · 2023-08-13T15:23:02Z

Edit: Examining the resident size and virtual memory size of the vramfs process, I think the issue is that vramfs asks for additional memory to serve reads/writes.

Is it? mlockall is called with the MCL_CURRENT | MCL_FUTURE flags, so it should also prevent all future allocations of memory from being swapped, unless I misunderstood the documentation.

Code in vramfs:

vramfs/src/vramfs.cpp

Line 534 in 829b1f2

if (mlockall(MCL_CURRENT | MCL_FUTURE)) {

Documentation:

       MCL_CURRENT
              Lock all pages which are currently mapped into the address
              space of the process.

       MCL_FUTURE
              Lock all pages which will become mapped into the address
              space of the process in the future.  These could be, for
              instance, new pages required by a growing heap and stack
              as well as new memory-mapped files or shared memory
              regions.

fanzhuyifan · 2023-08-13T23:25:22Z

Edit: Examining the resident size and virtual memory size of the vramfs process, I think the issue is that vramfs asks for additional memory to serve reads/writes.

Is it? mlockall is called with the MCL_CURRENT | MCL_FUTURE flags, so it should also prevent all future allocations of memory from being swapped, unless I misunderstood the documentation.

Here are the steps to prove my point (on linux):

Start vramfs, say creating a filesystem with size 2000MB, and find the PID of the process.
Run cat /proc/PID/status | grep Vm to find the memory information. On a particular run on my computer I got

VmPeak: 7060808 kB
VmSize: 7060808 kB
VmLck: 6990308 kB
VmPin: 0 kB
VmHWM: 275588 kB
VmRSS: 275588 kB
VmData: 144976 kB
VmStk: 164 kB
VmExe: 132 kB
VmLib: 14156 kB
VmPTE: 628 kB
VmSwap: 0 kB

Write random data to a file on the vramfs, and check memory usage again.
First run dd if=/dev/random of=/tmp/vram/swapfile bs=1M count=1000, and then I got:

VmPeak: 7585096 kB
VmSize: 7388488 kB
VmLck: 7317988 kB
VmPin: 0 kB
VmHWM: 286148 kB
VmRSS: 286148 kB
VmData: 156092 kB
VmStk: 164 kB
VmExe: 132 kB
VmLib: 14156 kB
VmPTE: 668 kB
VmSwap: 0 kB

Note that the bolded entries all increased.

Let's read that file and check memory usage again.
First run sha256sum /tmp/vram/swapfile, and then I got

VmPeak: 7585096 kB
VmSize: 7462220 kB
VmLck: 7391720 kB
VmPin: 0 kB
VmHWM: 296072 kB
VmRSS: 296072 kB
VmData: 165960 kB
VmStk: 164 kB
VmExe: 132 kB
VmLib: 14156 kB
VmPTE: 692 kB
VmSwap: 0 kB

The bolded entries increased again.

I believe this proves that vramfs asks for more memory when serving read and write requests.
I am not saying the extra memory is swapped. I am just saying that sometimes it asks for extra memory to serve read and write requests. I suspect that this is the reason the computer freezes when using vramfs as swap, even with the mlockall call. In a system with high memory pressure, the OS tries to swap certain memory pages to vramfs. To serve this request, vramfs needs to perform some writes to the vram, and in the process asks for more memory. Since there is already no available memory, the system freezes.

jnturton · 2023-10-06T14:10:58Z

Since there is already no available memory, the system freezes.

Wouldn't we see OOM Killer entries in the kernel logs in this case?

Atrate mentioned this issue May 29, 2022

[Bug] vramfs deadlock caused by vramfs getting swapped to vram swap #31

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Use as SWAP #3

Use as SWAP #3

snshn commented Dec 14, 2014

Overv commented Dec 14, 2014

ptman commented Dec 14, 2014

Overv commented Dec 14, 2014

snshn commented Dec 14, 2014

Overv commented Dec 15, 2014

snshn commented Dec 15, 2014

agrover commented Dec 15, 2014

bisqwit commented Jan 4, 2020 •

edited

Loading

j123b567 commented Jan 21, 2020

bisqwit commented Jan 21, 2020

j123b567 commented Jan 21, 2020

dhalsimax commented Oct 6, 2020 •

edited

Loading

LHLaurini commented Nov 16, 2020

montvid commented Dec 15, 2020

wonghang commented Jan 20, 2021

azureblue commented Dec 3, 2021

bisqwit commented Dec 3, 2021

montvid commented Dec 3, 2021

wonghang commented Dec 3, 2021 •

edited

Loading

Atrate commented May 29, 2022 •

edited

Loading

bisqwit commented May 29, 2022

Atrate commented Nov 27, 2022

twobombs commented Jan 15, 2023

Atrate commented Jan 15, 2023

twobombs commented Jan 15, 2023

twobombs commented Feb 3, 2023

aedalzotto commented Aug 3, 2023 •

edited

Loading

Atrate commented Aug 12, 2023 •

edited

Loading

fanzhuyifan commented Aug 13, 2023 •

edited

Loading

Atrate commented Aug 13, 2023 •

edited

Loading

fanzhuyifan commented Aug 13, 2023 •

edited

Loading

jnturton commented Oct 6, 2023 •

edited

Loading

Use as SWAP #3

Use as SWAP #3

Comments

snshn commented Dec 14, 2014

Overv commented Dec 14, 2014

ptman commented Dec 14, 2014

Overv commented Dec 14, 2014

snshn commented Dec 14, 2014

Overv commented Dec 15, 2014

snshn commented Dec 15, 2014

agrover commented Dec 15, 2014

bisqwit commented Jan 4, 2020 • edited Loading

j123b567 commented Jan 21, 2020

bisqwit commented Jan 21, 2020

j123b567 commented Jan 21, 2020

dhalsimax commented Oct 6, 2020 • edited Loading

LHLaurini commented Nov 16, 2020

montvid commented Dec 15, 2020

wonghang commented Jan 20, 2021

azureblue commented Dec 3, 2021

bisqwit commented Dec 3, 2021

montvid commented Dec 3, 2021

wonghang commented Dec 3, 2021 • edited Loading

Atrate commented May 29, 2022 • edited Loading

bisqwit commented May 29, 2022

Atrate commented Nov 27, 2022

twobombs commented Jan 15, 2023

Atrate commented Jan 15, 2023

twobombs commented Jan 15, 2023

twobombs commented Feb 3, 2023

aedalzotto commented Aug 3, 2023 • edited Loading

Atrate commented Aug 12, 2023 • edited Loading

fanzhuyifan commented Aug 13, 2023 • edited Loading

Atrate commented Aug 13, 2023 • edited Loading

fanzhuyifan commented Aug 13, 2023 • edited Loading

jnturton commented Oct 6, 2023 • edited Loading

bisqwit commented Jan 4, 2020 •

edited

Loading

dhalsimax commented Oct 6, 2020 •

edited

Loading

wonghang commented Dec 3, 2021 •

edited

Loading

Atrate commented May 29, 2022 •

edited

Loading

aedalzotto commented Aug 3, 2023 •

edited

Loading

Atrate commented Aug 12, 2023 •

edited

Loading

fanzhuyifan commented Aug 13, 2023 •

edited

Loading

Atrate commented Aug 13, 2023 •

edited

Loading

fanzhuyifan commented Aug 13, 2023 •

edited

Loading

jnturton commented Oct 6, 2023 •

edited

Loading