We can run FastConv on android mobile phones with Snapdragon 835, Snapdragon 855, Snapdragon888, ARM server with Kunpeng 920 and MacBook with Apple M1. This description sketches how to obtain the corresponding software package needed, source code, and how to compile source code to get the reported performance.
This program requires hardwares support armv8 aarch64 architecture.
- MacBook with Apple M1 processor
- ARM Server with Kunpeng 920
- X86 Server usb-connected with mobile phones with Snapdragon 835, 855 or 888
- Operating system: 64-bit Linux, macOS and Android
- Compiler: Apple Clang++/GNU G++, and Android Clang++
- Other development software: If used on Android, Android NDK and ADB are necessary. Android NDK is used to cross-compile the program on the server, and Android ADB is used to upload the binary executable file to the mobile device and remotely debug the program on the server.
Please refer to Android NDK Document (https://developer.android.com/ndk/guides) for installation.
Please refer to Android ADB Document (https://developer.android.com/studio/command-line/adb) for installation.
git clone
cd FastConv/
make
-a is used to select algorithm
-tn is used to decide whether to autotune
-i is used to decide the number of iterations
-t is used to decide the number of threads
# Algorithm is automatically selected, no tuning, use default parameters, No. iterations is 10, No. threads is 8
./run_vgg16.sh -a auto -tn no_tuning -i 10 -t 8
# Algorithm is winograd, no tuning, use default parameters, No. iterations is 10, No. threads is 8
./run_vgg16.sh -a winograd -tn no_tuning -i 10 -t 8
# Algorithm is Im2col, no tuning, use default parameters, No. iterations is 10, No. threads is 8
./run_vgg16.sh -a im2col -tn no_tuning -i 10 -t 8
# Algorithm is automatically selected, tuning, No. iterations is 10, No. threads is 8
./run_vgg16.sh -a auto -tn tuning -i 10 -t 8
# Algorithm is winograd, tuning, No. iterations is 10, No. threads is 8
./run_vgg16.sh -a winograd -tn tuning -i 10 -t 8
# Algorithm is Im2col, tuning, No. iterations is 10, No. threads is 8
./run_vgg16.sh -a im2col -tn tuning -i 10 -t 8
# Algorithm is automatically selected, no tuning, use default parameters, No. iterations is 10, No. threads is 8
./run_resnet50.sh -a auto -tn no_tuning -i 10 -t 8
# Algorithm is winograd, no tuning, use default parameters, No. iterations is 10, No. threads is 8
./run_resnet50.sh -a winograd -tn no_tuning -i 10 -t 8
# Algorithm is Im2col, no tuning, use default parameters, No. iterations is 10, No. threads is 8
./run_resnet50.sh -a im2col -tn no_tuning -i 10 -t 8
# Algorithm is automatically selected, tuning, No. iterations is 10, No. threads is 8
./run_resnet50.sh -a auto -tn tuning -i 10 -t 8
# Algorithm is winograd, tuning, No. iterations is 10, No. threads is 8
./run_resnet50.sh -a winograd -tn tuning -i 10 -t 8
# Algorithm is Im2col, tuning, No. iterations is 10, No. threads is 8
./run_resnet50.sh -a im2col -tn tuning -i 10 -t 8
First, upload the binary executable file and running scripts to the Android device.
# Upload the binary executable file
adb -s (Device_id) push ./winograd_dev $(PathinAndroiddevice)
# Upload the running scripts
adb -s (Device_id) push run_vgg16.sh $(PathinAndroiddevice)
adb -s (Device_id) push run_resnet50.sh $(PathinAndroiddevice)
# Connect to android device
adb -s (Device_id) shell
Then, the method of running this software is similar to running on Linux.