Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Arm port #1009

Draft
wants to merge 5 commits into
base: master
Choose a base branch
from
Draft

Arm port #1009

wants to merge 5 commits into from

Conversation

XapaJIaMnu
Copy link
Contributor

@XapaJIaMnu XapaJIaMnu commented Sep 15, 2023

Description

RFC on the initial ARM port work
List of changes:

  • Added ARM cmake definition
  • Added simd_utils to translate x86_64 simd intrinsics to ARM
  • Added RUY for gemm implementation

Added dependencies: Ruy, simd_utils

How to test

Download a model and run it:

nik@nikserver2:~/enes$ ~/marian-dev/build/marian-decoder -m model1.npz --mini-batch 1 --maxi-batch 1 -v vocab.esen.spm vocab.esen.spm --quiet --quiet-translation
Hi there, I am running on an ARM PC.
Hola, estoy corriendo en un PC ARM.

TODO

  • Use RUY to make int8 inference on ARM
  • Check how to make it work on ARMv7 (Currently hardcoded to v8, might need to use a different simd-arm library)
  • Maybe get rid of OpenBlas? Ruy doesn't have full sgemm support but we fake it. We need Ruy for doing 8bits and FAISS requires a lot of BLAS routines (hence openblas)
  • More proper guards?
  • Currently we are not vectorising as much as we can because we are only using ARM SIMD for the SSE codepath, but we could technically do AVX. Also proper ARM intrinsics might be faster.

Checklist

  • I have tested the code manually
  • I have run regression tests
  • I have read and followed CONTRIBUTING.md
  • I have updated CHANGELOG.md

@hieuhoang
Copy link
Collaborator

Looks good. Checked that it still works with some of my existing models on (Intel) CPU and GPUs.

#endif

#if USE_RUY_SGEMM
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not particularly happy about this.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rome wasn't built in a day

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you have an M1 to test on ? We should test RUY vs OpenBLas vs AppleAccelerate to determine what sgemm implementation to use here.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the case with M1

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ruy path was intended for Android. The usual BLAS was painful to compile. Since not available via package manager or anything we'd have to go compile-route with a submodule addition if going for OpenBLAS.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did you guys ever get the int8 stuff to work? Did you use RUY?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I haven't ported it from our fork yet. Int8 with ruy works in our port.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah right. Where's your code? We'll take a look. May give you a hand with it

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@@ -0,0 +1,142 @@
# Modified from https://github.com/axr/solar-cmake/blob/73cfea0db0284c5e2010aca23989046e5bda95c9/Solar.cmake
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TODO. This file needs to be removed and detecting ARM needs to be done in a different way. The variables that this file sets break the double M1/x86_64 compilation that is supposed to be done on modern OSX.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what are the offending variables in this file?

@vrnmthr vrnmthr mentioned this pull request Dec 1, 2023
4 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants