Unable to run an example file (within examples/python/pytorch) - cifar10_cnn.py #1193

Rashad-CSU · 2023-10-17T04:58:20Z

Hi,

New to FlexFlow. Trying to understand how it works. I believe I have installed and built it based on the provided instructions. However, I am unable to run some example scripts. Thanks in advance for your help!

I am getting the following issue when I try to run "cifar10_cnn.py", within examples/python/pytorch.

Using flexflow backend.
Traceback (most recent call last):
File "/home/kabir/PycharmProjects/test_AI_worokload/FlexFlow/examples/python/pytorch/cifar10_cnn.py", line 66, in
top_level_task()
File "/home/kabir/PycharmProjects/test_AI_worokload/FlexFlow/examples/python/pytorch/cifar10_cnn.py", line 8, in top_level_task
ffconfig = FFConfig()
^^^^^^^^^^
File "/home/kabir/anaconda3/envs/flexflow/lib/python3.11/site-packages/flexflow/core/flexflow_cffi.py", line 735, in init
self.handle = ffc().flexflow_config_create()
^^^^^
File "/home/kabir/anaconda3/envs/flexflow/lib/python3.11/site-packages/flexflow/core/flexflow_cffi.py", line 43, in ffc
raise RuntimeError("Cannot use FlexFlow library before initializing FlexFlow")
RuntimeError: Cannot use FlexFlow library before initializing FlexFlow

Also, sharing the output of running "config.linux" if it helps.
CUDA_PATH=/usr/local/cuda/lib64/stubs cmake -DCUDA_USE_STATIC_CUDA_RUNTIME=OFF -DLegion_HIJACK_CUDART=OFF -DINFERENCE_TESTS=OFF -DLIBTORCH_PATH=/home/kabir/PycharmProjects/test_AI_worokload/libtorch -DCMAKE_BUILD_TYPE=Release -DFF_CUDA_ARCH=autodetect -DCUDA_PATH=/usr/local/cuda -DCUDNN_PATH=/usr/local/cuda -DFF_HIP_ARCH=all -DFF_USE_PYTHON=ON -DFF_USE_NCCL=ON -DNCCL_PATH=/usr/local/cuda -DFF_BUILD_ALL_EXAMPLES=OFF -DFF_BUILD_ALL_INFERENCE_EXAMPLES=ON -DFF_USE_PREBUILT_LEGION=OFF -DFF_USE_PREBUILT_NCCL=OFF -DFF_USE_ALL_PREBUILT_LIBRARIES=OFF -DFF_BUILD_UNIT_TESTS=OFF -DFF_USE_AVX2=OFF -DFF_MAX_DIM=5 -DLEGION_MAX_RETURN_SIZE=262144 -DROCM_PATH=/opt/rocm -DFF_GPU_BACKEND=cuda ./../
-- FF_GASNET_CONDUIT: mpi
-- Linux Version: 20.2
-- CPU architecture: x86_64
CMake Warning (dev) at cmake/cuda.cmake:6 (find_package):
Policy CMP0146 is not set: The FindCUDA module is removed. Run "cmake
--help-policy CMP0146" for policy details. Use the cmake_policy command to
set the policy and suppress this warning.

Call Stack (most recent call first):
CMakeLists.txt:225 (include)
This warning is for project developers. Use -Wno-dev to suppress it.

-- CUDA Detected CUDA_ARCH : 70
-- CUDA_VERSION: 11.0
-- CUDA root path : /usr/local/cuda
-- CUDA include path : /usr/local/cuda/include
-- CUDA runtime libraries :
-- CUDA driver libraries : /usr/local/cuda/lib64/stubs/libcuda.so
-- CUBLAS libraries : /usr/local/cuda/lib64/libcublas.so
-- CURAND libraries : /usr/local/cuda/lib64/libcurand.so
-- CUDA Arch : 70
-- CUDA_GENCODE: -gencode arch=compute_70,code=sm_70
-- CUDNN include : /usr/include
-- CUDNN libraries : /usr/lib/x86_64-linux-gnu/libcudnn.so
-- NCCL include : /usr/local/include
-- NCCL libraries : /usr/local/lib/libnccl.so
-- Building Legion from source
-- GASNET ROOT:
-- Version string from git: legion-23.06.0-4593-g626b55689
CMake Warning (dev) at deps/legion/CMakeLists.txt:278 (find_package):
Policy CMP0148 is not set: The FindPythonInterp and FindPythonLibs modules
are removed. Run "cmake --help-policy CMP0148" for policy details. Use
the cmake_policy command to set the policy and suppress this warning.

This warning is for project developers. Use -Wno-dev to suppress it.

CMake Warning (dev) at deps/legion/CMakeLists.txt:279 (find_package):
Policy CMP0148 is not set: The FindPythonInterp and FindPythonLibs modules
are removed. Run "cmake --help-policy CMP0148" for policy details. Use
the cmake_policy command to set the policy and suppress this warning.

This warning is for project developers. Use -Wno-dev to suppress it.

CMake Deprecation Warning at config/_deps/json-src/CMakeLists.txt:1 (cmake_minimum_required):
Compatibility with CMake < 3.5 will be removed from a future version of
CMake.

Update the VERSION argument value or use a ... suffix to tell
CMake that the project does not need compatibility with older versions.

-- Using the single-header code from /home/kabir/PycharmProjects/test_AI_worokload/FlexFlow/config/_deps/json-src/single_include/
-- FlexFlow MAX_DIM: 5
-- LEGION_MAX_RETURN_SIZE: 262144
-- system-nameLinux
CMake Deprecation Warning at deps/tokenizers-cpp/sentencepiece/CMakeLists.txt:15 (cmake_minimum_required):
Compatibility with CMake < 3.5 will be removed from a future version of
CMake.

Update the VERSION argument value or use a ... suffix to tell
CMake that the project does not need compatibility with older versions.

-- VERSION: 0.2.00
-- Configuring done (1.1s)
-- Generating done (0.2s)
-- Build files have been written to: /home/kabir/PycharmProjects/test_AI_worokload/FlexFlow/config

OneMoreProblem · 2023-10-17T15:43:09Z

Hi, Rashad.

I suspect that you used a common python to launch it. If you want to use a python you need to provide a config file that contains test parameters. Here you can find an example of it tests/multi_gpu_tests.sh:

$EXE "$FF_HOME"/examples/python/native/cifar10_cnn.py -config-file /tmp/flexflow/multi_gpu_tests/test_params_40_epochs_no_batch_size.json

You can find an example of how you can form a config file here tests/python_interface_test.sh:

# Generate configs JSON files
        test_params=$(jq -n --arg num_gpus "$GPUS" --arg memory_per_gpu "$FSIZE" --arg zero_copy_memory_per_node "$ZSIZE" --arg batch_size "$BATCHSIZE" --arg only_data_parallel "$ONLY_DATA_PARALLEL" '{"num_gpus":$num_gpus,"memory_per_gpu":$memory_per_gpu,"zero_copy_memory_per_node":$zero_copy_memory_per_node,"batch_size":$batch_size,"only_data_parallel":$only_data_parallel}')
        mkdir -p /tmp/flexflow/multi_gpu_tests
        echo "$test_params" > /tmp/flexflow/multi_gpu_tests/test_params.json

and then launch an example:

python "$FF_HOME"/examples/python/native/mnist_cnn.py -config-file /tmp/flexflow/multi_gpu_tests/*.json

I believe it should work)

Another way to launch it is to build a flexflow, use flexflow_python, and define parameters:

flexflow_python "$FF_HOME"/examples/python/native/cifar10_cnn.py -ll:gpu 1 -ll:fsize 30000 -ll:zsize 3000

It works for me.

Hope it was helpful and clear)

Also, you can use flexflow.serve.init() inside python code.


import flexflow.serve as ff

# FlexFlow initialisation
ff.init(
        num_gpus=2,
        memory_per_gpu=30000,
        zero_copy_memory_per_node=30000,
        tensor_parallelism_degree=2,
        pipeline_parallelism_degree=1,
        num_cpus=10,
        profiling=True
    )

Rashad-CSU · 2023-10-19T17:56:24Z

Hi,

Thanks. The following "flexflow_python "$FF_HOME"/examples/python/native/cifar10_cnn.py -ll:gpu 1 -ll:fsize 30000 -ll:zsize 3000" worked.

However, I was hoping to build a tree structure for the model based on multiple GPUs. Is it possible to do this on a machine with one GPU?

Thanks,
Rashad

OneMoreProblem · 2023-10-19T18:01:48Z

@Rashad-CSU You're literally reading my mind!

#1172

I asked a similar question)

Rashad-CSU · 2023-10-19T18:08:59Z

Thank you for directing me to issue #1172.

Rashad-CSU closed this as completed Oct 26, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Unable to run an example file (within examples/python/pytorch) - cifar10_cnn.py #1193

Unable to run an example file (within examples/python/pytorch) - cifar10_cnn.py #1193

Rashad-CSU commented Oct 17, 2023

OneMoreProblem commented Oct 17, 2023 •

edited

Loading

Rashad-CSU commented Oct 19, 2023

OneMoreProblem commented Oct 19, 2023

Rashad-CSU commented Oct 19, 2023

Unable to run an example file (within examples/python/pytorch) - cifar10_cnn.py #1193

Unable to run an example file (within examples/python/pytorch) - cifar10_cnn.py #1193

Comments

Rashad-CSU commented Oct 17, 2023

OneMoreProblem commented Oct 17, 2023 • edited Loading

Rashad-CSU commented Oct 19, 2023

OneMoreProblem commented Oct 19, 2023

Rashad-CSU commented Oct 19, 2023

OneMoreProblem commented Oct 17, 2023 •

edited

Loading