support for arm64 #308

BlairSadewitz · 2024-03-11T22:00:22Z

BlairSadewitz
Mar 11, 2024

Hi,

I was just wondering if you could build wheels for aarch64 (to support the GH200). BTW, I'm impressed with all the new features that you've recently added.

Thanks,

Blair

AlpinDale · 2024-03-12T00:15:20Z

AlpinDale
Mar 12, 2024
Maintainer

Thanks for your interest! Unfortunately I don't have an aarch64 CPU to build on. Are GH200s not compatible with x86_64?

I wonder if setuptools allows emulating different CPU arches. We currently use GH actions to build the wheels.

0 replies

BlairSadewitz · 2024-03-12T19:48:03Z

BlairSadewitz
Mar 12, 2024
Author

Yeah, that's what I was wondering, too. I'm not totally clear on it, but I think what I'm using (it's on vast.ai, and if I rent it as interruptible, it costs pennies, and I've gotten hours out of it before someone interrupts me lol).

https://resources.nvidia.com/en-us-grace-cpu/grace-hopper-superchip

The thing is some sort of GPU+CPU on one die (or something like that). I love it. It absolutely TROUNCES the A100, and you get 96G of vram (it may use some sort of unified memory, I'm not sure). The nvlink equivalent for it is coherent with the system RAM. Look at that absolute unit. It's glorious.

0 replies

BlairSadewitz · 2024-03-12T19:54:48Z

BlairSadewitz
Mar 12, 2024
Author

But yeah, it's aarch64, definitely, which makes it kinda annoying for finding packages. I've been using abacusai/gh200-llm/llm-train-serve as the docker image, or one of nvidia's own. I've mostly used koboldcpp with it, as I don't have to deal with finding any python packages. I haven't gotten this working with it, because, well, dephell, and I'm not a python guy, even though I should be. llama.cpp/koboldcpp are straightforward to get to work (I build it with cmake/ninja and it flies through it).

0 replies

AlpinDale · 2024-03-12T20:23:43Z

AlpinDale
Mar 12, 2024
Maintainer

I didn't know they had GH200s on vast! I'll look into this and see if I can set up a build pipeline for arm CPUs. Will let you know. Thanks for reporting!

0 replies

AlpinDale · 2024-03-12T20:26:51Z

AlpinDale
Mar 12, 2024
Maintainer

On that note, @BlairSadewitz , when you have the chance, can you confirm if all the packages in requirements.txt are available for that CPU arch? It would be pointless if aphrodite is built for it, but not its dependencies.

0 replies

BlairSadewitz · 2024-03-13T00:26:58Z

BlairSadewitz
Mar 13, 2024
Author

Yeah. I got it to work again, but easily. I started with this docker image:

ghcr.io/abacusai/gh200-llm/llm-train-serve:latest

Then,

requirements.txt | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/requirements.txt b/requirements.txt
index 5cf906f..5847777 100644
--- a/requirements.txt
+++ b/requirements.txt
@@ -3,7 +3,7 @@ psutil
 ray >= 2.9
 sentencepiece
 numpy
-torch == 2.2.0
+torch >= 2.2.0
 transformers >= 4.36.0 # for mixtral
 uvicorn
 openai # for fastapi's openai proxy emulation
@@ -24,4 +24,4 @@ rich

I then installed the requirements, and it recognized everything from the system. Then I did:

pip install openai 'fschat>=0.2.23' colorlog lark==1.1.8 gguf 'outlines>=0.0.27' loguru

then:
export TORCH_CUDA_ARCH_LIST="6.1 7.0 7.5 8.0 8.6 8.9 9.0+PTX" python setup.py bdist_wheel
https://filebin.net/gib79tl05k95mtnr/aphrodite_engine-0.5.0_cu123-cp310-cp310-linux_aarch64.whl

0 replies

AlpinDale · 2024-03-13T00:31:08Z

AlpinDale
Mar 13, 2024
Maintainer

Nice, thanks for confirming.

On a related note, I'm not seeing any GH200's on vast.ai - are they so rare, the GPU type isn't listed yet?

11 replies

AlpinDale Mar 13, 2024
Maintainer

Hmm, it seems like it's not recognizing the CUDA_HOME variable, says it's unset.

BlairSadewitz Mar 13, 2024
Author

Oh, yeah, I had a problem with that--I forgot what I did.

BlairSadewitz Mar 13, 2024
Author

Ooh, I just got on it as an interruptible instance, lemme see, lol.

BlairSadewitz Mar 13, 2024
Author

I spent HOURS tracking down a docker container that had an environment it liked, haha. That is what most of the labor was. There's no pytorch release or any nightlies on whl.pytorch.org or anywhere else that I could find that satisfied its demands/didn't drag me into dephell. And it wasn't just pytorch, either--triton takes FOREVER to build.

$CUDA_HOME should be set in the environment by default (?)

It just built successfully "out of the box" with only those two changes to the requirements.

After I actually try actually USING it a little more, I'll create a new instance and see if it builds the wheel with solely those changes.

UPDATE: err, nope, I broke something, hold on.

BlairSadewitz Mar 13, 2024
Author

Hahah, wait, WTF is with that export command? I clearly did not type that right! I hope u weren't just pasting that, u know, trusting me to have a frontal lobe and all. Obviously, there should be a carriage return in there haha. Actually, it's set in the environment of the container, so it's unnecessary.

BlairSadewitz · 2024-03-13T01:10:17Z

BlairSadewitz
Mar 13, 2024
Author

At least half the time when I do that, no one else rents it for at least like an hour or so, often way more.

0 replies

BlairSadewitz · 2024-03-13T03:33:54Z

BlairSadewitz
Mar 13, 2024
Author

I think that wheel I built somehow picked up the wrong pytorch dependency, but I'm not on it now to check. Hmm. I think it's about time to like maybe eat or something. It sure would be handy if conda or pip or SOMEWHERE had this stuff prebuilt. They don't even have the wheels for CUDA on pytorch.org (that I've found, anyway). Ugh.

0 replies

BlairSadewitz · 2024-04-13T17:05:52Z

BlairSadewitz
Apr 13, 2024
Author

Hi, so I found some docker image with CUDA 12.1 and pytorch 2.2.0 built for aarch64. Granted, I don't know if it actually WORKS (well, I mean, the machine runs it, and llama.cpp works on it, but I haven't tested pytorch yet). Saves me a lot of effort, because really to get this to work, I was gonna otherwise have to build pytorch myself. As exciting a rite of passage as it seems, I'd rather not pay for GPU time to do it, haha.

I've either tracked down or built the various other dependencies myself, and the build seems to be fine. However, I ABSOLUTELY CANNOT GET THE DAMN THING TO INSTALL hadamard.safetensors and the objects. I've tried ripping out the logic in setup.py and FORCING it to do it, and it just won't.
Here's what it installs:

install.txt

I am absolutely at a loss as to why the thing will just not install it. Do you have any idea WTF is going on here?

Also, I already have pytorch 2.2.0 in my conda environment, yet it always wants to reinstall it itself. I do not want it to do that, because that pytorch has no GPU support, lol. Do you have any idea why it insists on doing that? I mean, it's a dependency, yeah, OK, but it is clearly already present. The conda environment I'm using is a clone of the base environment, but it also happens if I don't use conda at all.

Thanks.

0 replies

BlairSadewitz · 2024-04-25T16:01:19Z

BlairSadewitz
Apr 25, 2024
Author

Hey, there are pytorch 2.3.0 + CUDA (11.8/12.1) images available on Docker Hub as of today for aarch64, so I'm stoked lol. Wasn't expecting that to happen until the next (minor version?) release. Oddly enough, there aren't any for amd64, but I presume they're building them, as the aarch64 images are only like 14 hours old. Unfortunately, nothing new seems to be available in the anaconda repos.

I really like using the GH200, because I can rent it pretty cheap as an interruptible instance--hard to go wrong with 96GB of VRAM on one device. I think there may be less demand for them because a lot of docker images simply don't support the architecture.

I'm going to see how this works out with it.

Also, notably 2.3.0 includes (not that I am capable of doing anything with it, but the first one, especially, seems like it might be useful):

[Beta] Support for User-defined Triton kernels in torch.compile
Allows for PyTorch code that contains triton kernels to be executed natively using torch.compile. This enables users to migrate code containing triton kernels from eager PyTorch to torch.compile without running into performance regressions or graph breaks. Native support also creates an opportunity for Torch Inductor to precompile the user-defined Triton kernel as well as better organize code around the Triton kernel allowing for further optimizations.

And:

[PROTOTYPE] Weight-Only-Quantization introduced into Inductor CPU backend
PyTorch 2.3 enhances LLM inference performance on torch inductor CPU backend. The project gpt-fast offers a simple and efficient PyTorch native acceleration for transformer text generation with torch.compile. Prior to 2.3 only CUDA devices were supported and this feature enables the CPU counterpart by providing highly optimized kernels for the int4 and int8 weight only quantization Linear.

0 replies

BlairSadewitz · 2024-04-28T18:41:39Z

BlairSadewitz
Apr 28, 2024
Author

I built pytorch 2.3.0 on the GH200 yesterday:

-rw-r--r-- 1 blair staff 188M Apr 27 23:09 torch-2.3.0-cp310-cp310-cu121-linux_aarch64.whl

There are plenty of operating systems that are way easier to build than that thing, lol. NetBSD takes one command. Happily, that machine is such a beastly unit that it absolutely blazed through it. I mean, to be fair, I wouldn't rate my competence as particularly high; on the other hand, I can't remember the last time I had any difficulty whatsoever following instructions to build something [that should build in the first place]. I ended up downloading the source for the conda package, ripping out the stuff that didn't need to run, setting the variables that it needed, etc. (took like 15 minutes, tops), then just turned it loose on the root of the filesystem. Worked right out the the gate, haha. I didn't do 2.2.x because all of the NVIDIA pytorch containers use snapshots, so I figured it wasn't quite ripe yet.

I then built aphrodite-engine with it, and it seemed to work fine. That was the last dependency I needed to build a wheel for (the others are xformers, flash_attn, and triton). Having 96GB of VRAM on one device is pretty sweet. Next, I'm gonna try to set up proper builds for these packages, as they just don't seem to be forthcoming. I assume that's probably because most people that actually use a GH200 have their own development environments that they build this stuff in for their own purposes as a matter of course, or use the containers.

Are you planning on switching to pytorch 2.3.0?

0 replies

BlairSadewitz · 2024-05-06T23:22:30Z

BlairSadewitz
May 6, 2024
Author

Have you ever taken a look at spack, the package manager? The thing will build pytorch to order with a simple command. It's been making my life a lot easier on aarch64.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

support for arm64 #308

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 13 comments 11 replies

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{title}}

Select a reply

support for arm64 #308

BlairSadewitz Mar 11, 2024

Replies: 13 comments · 11 replies

AlpinDale Mar 12, 2024 Maintainer

BlairSadewitz Mar 12, 2024 Author

BlairSadewitz Mar 12, 2024 Author

AlpinDale Mar 12, 2024 Maintainer

AlpinDale Mar 12, 2024 Maintainer

BlairSadewitz Mar 13, 2024 Author

AlpinDale Mar 13, 2024 Maintainer

AlpinDale Mar 13, 2024 Maintainer

BlairSadewitz Mar 13, 2024 Author

BlairSadewitz Mar 13, 2024 Author

BlairSadewitz Mar 13, 2024 Author

BlairSadewitz Mar 13, 2024 Author

BlairSadewitz Mar 13, 2024 Author

BlairSadewitz Mar 13, 2024 Author

BlairSadewitz Apr 13, 2024 Author

BlairSadewitz Apr 25, 2024 Author

BlairSadewitz Apr 28, 2024 Author

BlairSadewitz May 6, 2024 Author

BlairSadewitz
Mar 11, 2024

Replies: 13 comments 11 replies

AlpinDale
Mar 12, 2024
Maintainer

BlairSadewitz
Mar 12, 2024
Author

BlairSadewitz
Mar 12, 2024
Author

AlpinDale
Mar 12, 2024
Maintainer

AlpinDale
Mar 12, 2024
Maintainer

BlairSadewitz
Mar 13, 2024
Author

AlpinDale
Mar 13, 2024
Maintainer

AlpinDale Mar 13, 2024
Maintainer

BlairSadewitz Mar 13, 2024
Author

BlairSadewitz Mar 13, 2024
Author

BlairSadewitz Mar 13, 2024
Author

BlairSadewitz Mar 13, 2024
Author

BlairSadewitz
Mar 13, 2024
Author

BlairSadewitz
Mar 13, 2024
Author

BlairSadewitz
Apr 13, 2024
Author

BlairSadewitz
Apr 25, 2024
Author

BlairSadewitz
Apr 28, 2024
Author

BlairSadewitz
May 6, 2024
Author