Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unable to run an example file (within examples/python/pytorch) - cifar10_cnn.py #1193

Closed
Rashad-CSU opened this issue Oct 17, 2023 · 4 comments

Comments

@Rashad-CSU
Copy link

Hi,

New to FlexFlow. Trying to understand how it works. I believe I have installed and built it based on the provided instructions. However, I am unable to run some example scripts. Thanks in advance for your help!

I am getting the following issue when I try to run "cifar10_cnn.py", within examples/python/pytorch.

Using flexflow backend.
Traceback (most recent call last):
File "/home/kabir/PycharmProjects/test_AI_worokload/FlexFlow/examples/python/pytorch/cifar10_cnn.py", line 66, in
top_level_task()
File "/home/kabir/PycharmProjects/test_AI_worokload/FlexFlow/examples/python/pytorch/cifar10_cnn.py", line 8, in top_level_task
ffconfig = FFConfig()
^^^^^^^^^^
File "/home/kabir/anaconda3/envs/flexflow/lib/python3.11/site-packages/flexflow/core/flexflow_cffi.py", line 735, in init
self.handle = ffc().flexflow_config_create()
^^^^^
File "/home/kabir/anaconda3/envs/flexflow/lib/python3.11/site-packages/flexflow/core/flexflow_cffi.py", line 43, in ffc
raise RuntimeError("Cannot use FlexFlow library before initializing FlexFlow")
RuntimeError: Cannot use FlexFlow library before initializing FlexFlow

Also, sharing the output of running "config.linux" if it helps.
CUDA_PATH=/usr/local/cuda/lib64/stubs cmake -DCUDA_USE_STATIC_CUDA_RUNTIME=OFF -DLegion_HIJACK_CUDART=OFF -DINFERENCE_TESTS=OFF -DLIBTORCH_PATH=/home/kabir/PycharmProjects/test_AI_worokload/libtorch -DCMAKE_BUILD_TYPE=Release -DFF_CUDA_ARCH=autodetect -DCUDA_PATH=/usr/local/cuda -DCUDNN_PATH=/usr/local/cuda -DFF_HIP_ARCH=all -DFF_USE_PYTHON=ON -DFF_USE_NCCL=ON -DNCCL_PATH=/usr/local/cuda -DFF_BUILD_ALL_EXAMPLES=OFF -DFF_BUILD_ALL_INFERENCE_EXAMPLES=ON -DFF_USE_PREBUILT_LEGION=OFF -DFF_USE_PREBUILT_NCCL=OFF -DFF_USE_ALL_PREBUILT_LIBRARIES=OFF -DFF_BUILD_UNIT_TESTS=OFF -DFF_USE_AVX2=OFF -DFF_MAX_DIM=5 -DLEGION_MAX_RETURN_SIZE=262144 -DROCM_PATH=/opt/rocm -DFF_GPU_BACKEND=cuda ./../
-- FF_GASNET_CONDUIT: mpi
-- Linux Version: 20.2
-- CPU architecture: x86_64
CMake Warning (dev) at cmake/cuda.cmake:6 (find_package):
Policy CMP0146 is not set: The FindCUDA module is removed. Run "cmake
--help-policy CMP0146" for policy details. Use the cmake_policy command to
set the policy and suppress this warning.

Call Stack (most recent call first):
CMakeLists.txt:225 (include)
This warning is for project developers. Use -Wno-dev to suppress it.

-- CUDA Detected CUDA_ARCH : 70
-- CUDA_VERSION: 11.0
-- CUDA root path : /usr/local/cuda
-- CUDA include path : /usr/local/cuda/include
-- CUDA runtime libraries :
-- CUDA driver libraries : /usr/local/cuda/lib64/stubs/libcuda.so
-- CUBLAS libraries : /usr/local/cuda/lib64/libcublas.so
-- CURAND libraries : /usr/local/cuda/lib64/libcurand.so
-- CUDA Arch : 70
-- CUDA_GENCODE: -gencode arch=compute_70,code=sm_70
-- CUDNN include : /usr/include
-- CUDNN libraries : /usr/lib/x86_64-linux-gnu/libcudnn.so
-- NCCL include : /usr/local/include
-- NCCL libraries : /usr/local/lib/libnccl.so
-- Building Legion from source
-- GASNET ROOT:
-- Version string from git: legion-23.06.0-4593-g626b55689
CMake Warning (dev) at deps/legion/CMakeLists.txt:278 (find_package):
Policy CMP0148 is not set: The FindPythonInterp and FindPythonLibs modules
are removed. Run "cmake --help-policy CMP0148" for policy details. Use
the cmake_policy command to set the policy and suppress this warning.

This warning is for project developers. Use -Wno-dev to suppress it.

CMake Warning (dev) at deps/legion/CMakeLists.txt:279 (find_package):
Policy CMP0148 is not set: The FindPythonInterp and FindPythonLibs modules
are removed. Run "cmake --help-policy CMP0148" for policy details. Use
the cmake_policy command to set the policy and suppress this warning.

This warning is for project developers. Use -Wno-dev to suppress it.

CMake Deprecation Warning at config/_deps/json-src/CMakeLists.txt:1 (cmake_minimum_required):
Compatibility with CMake < 3.5 will be removed from a future version of
CMake.

Update the VERSION argument value or use a ... suffix to tell
CMake that the project does not need compatibility with older versions.

-- Using the single-header code from /home/kabir/PycharmProjects/test_AI_worokload/FlexFlow/config/_deps/json-src/single_include/
-- FlexFlow MAX_DIM: 5
-- LEGION_MAX_RETURN_SIZE: 262144
-- system-nameLinux
CMake Deprecation Warning at deps/tokenizers-cpp/sentencepiece/CMakeLists.txt:15 (cmake_minimum_required):
Compatibility with CMake < 3.5 will be removed from a future version of
CMake.

Update the VERSION argument value or use a ... suffix to tell
CMake that the project does not need compatibility with older versions.

-- VERSION: 0.2.00
-- Configuring done (1.1s)
-- Generating done (0.2s)
-- Build files have been written to: /home/kabir/PycharmProjects/test_AI_worokload/FlexFlow/config

@OneMoreProblem
Copy link

OneMoreProblem commented Oct 17, 2023

Hi, Rashad.

I suspect that you used a common python to launch it. If you want to use a python you need to provide a config file that contains test parameters. Here you can find an example of it tests/multi_gpu_tests.sh:

$EXE "$FF_HOME"/examples/python/native/cifar10_cnn.py -config-file /tmp/flexflow/multi_gpu_tests/test_params_40_epochs_no_batch_size.json

You can find an example of how you can form a config file here tests/python_interface_test.sh:

# Generate configs JSON files
        test_params=$(jq -n --arg num_gpus "$GPUS" --arg memory_per_gpu "$FSIZE" --arg zero_copy_memory_per_node "$ZSIZE" --arg batch_size "$BATCHSIZE" --arg only_data_parallel "$ONLY_DATA_PARALLEL" '{"num_gpus":$num_gpus,"memory_per_gpu":$memory_per_gpu,"zero_copy_memory_per_node":$zero_copy_memory_per_node,"batch_size":$batch_size,"only_data_parallel":$only_data_parallel}')
        mkdir -p /tmp/flexflow/multi_gpu_tests
        echo "$test_params" > /tmp/flexflow/multi_gpu_tests/test_params.json

and then launch an example:

python "$FF_HOME"/examples/python/native/mnist_cnn.py -config-file /tmp/flexflow/multi_gpu_tests/*.json

I believe it should work)

Another way to launch it is to build a flexflow, use flexflow_python, and define parameters:

flexflow_python "$FF_HOME"/examples/python/native/cifar10_cnn.py -ll:gpu 1 -ll:fsize 30000 -ll:zsize 3000

It works for me.

Hope it was helpful and clear)

Also, you can use flexflow.serve.init() inside python code.


import flexflow.serve as ff

# FlexFlow initialisation
ff.init(
        num_gpus=2,
        memory_per_gpu=30000,
        zero_copy_memory_per_node=30000,
        tensor_parallelism_degree=2,
        pipeline_parallelism_degree=1,
        num_cpus=10,
        profiling=True
    )

@Rashad-CSU
Copy link
Author

Hi,

Thanks. The following "flexflow_python "$FF_HOME"/examples/python/native/cifar10_cnn.py -ll:gpu 1 -ll:fsize 30000 -ll:zsize 3000" worked.

However, I was hoping to build a tree structure for the model based on multiple GPUs. Is it possible to do this on a machine with one GPU?

Thanks,
Rashad

@OneMoreProblem
Copy link

@Rashad-CSU You're literally reading my mind!

#1172

I asked a similar question)

@Rashad-CSU
Copy link
Author

Thank you for directing me to issue #1172.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants