[Issue]: Getting errors with TensorFlow sample #78

bsctl · 2024-11-04T16:27:24Z

Problem Description

Hello team, I'm not able to run this example workload on Kubernetes.

Kubernetes version: v1.30.2 installed with kubeadm

Conainer Runtime: containerd

sudo containerd version
revision=83031836b2cf55637d7abf847b17134c51b38e53 version=v1.7.16

AMD GPU Plugin installed with Helm

helm list -A
NAME                    NAMESPACE               REVISION        UPDATED                                 STATUS          CHART                           APP VERSION
amd-gpu-plugin          kube-system             3               2024-10-28 14:22:16.138491735 +0000 UTC deployed        amd-gpu-0.14.0                  1.31.0.0   
calico                  kube-system             1               2024-10-24 13:06:16.670494216 +0000 UTC deployed        calico-cni-3.28.1               v3.28.1

Used these values for Helm chart:

  values:
    nfd:
      enabled: true
    labeller:
      enabled: true
    node_selector_enabled: true
    node_selector:
      feature.node.kubernetes.io/pci-0300_1002.present: "true"
      kubernetes.io/arch: amd64

Apply this manifest:

apiVersion: v1
kind: Pod
metadata:
  name: alexnet-tf-gpu-pod
  labels:
    purpose: demo-tf-amdgpu
spec:
  containers:
    - name: alexnet-tf-gpu-container
      image: rocm/tensorflow:latest
      workingDir: /root
      env:
      - name: HIP_VISIBLE_DEVICES
        value: "0" # # 0,1,2,...,n for running on GPU and select the GPUs, -1 for running on CPU
      command: ["/bin/bash", "-c", "--"]
      args: ["python3 /benchmarks/scripts/tf_cnn_benchmarks/tf_cnn_benchmarks.py --model=alexnet; trap : TERM INT; sleep infinity & wait"]
      resources:
        limits:
          amd.com/gpu: 1 # requesting a GPU

the pod starts but logs show errors:

kubectl logs alexnet-tf-gpu-pod
2024-11-04 15:49:31.826063: E external/local_xla/xla/stream_executor/plugin_registry.cc:91] Invalid plugin kind specified: FFT
2024-11-04 15:49:31.864338: I tensorflow/core/platform/cpu_feature_guard.cc:210] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: SSE3 SSE4.1 SSE4.2 AVX AVX2 AVX512F AVX512_VNNI AVX512_BF16 AVX512_FP16 AVX_VNNI AMX_TILE AMX_INT8 AMX_BF16 FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
2024-11-04 15:49:32.195859: E external/local_xla/xla/stream_executor/plugin_registry.cc:91] Invalid plugin kind specified: DNN
/usr/lib/python3/dist-packages/requests/__init__.py:89: RequestsDependencyWarning: urllib3 (1.26.20) or chardet (3.0.4) doesn't match a supported version!
  warnings.warn("urllib3 ({}) or chardet ({}) doesn't match a supported "
/usr/local/lib/python3.9/dist-packages/scipy/__init__.py:146: UserWarning: A NumPy version >=1.16.5 and <1.23.0 is required for this version of SciPy (detected version 1.26.4
  warnings.warn(f"A NumPy version >={np_minversion} and <{np_maxversion}"
Traceback (most recent call last):
  File "/benchmarks/scripts/tf_cnn_benchmarks/tf_cnn_benchmarks.py", line 25, in <module>
    import benchmark_cnn
  File "/benchmarks/scripts/tf_cnn_benchmarks/benchmark_cnn.py", line 44, in <module>
    from models import model_config
  File "/benchmarks/scripts/tf_cnn_benchmarks/models/model_config.py", line 31, in <module>
    from models.experimental import deepspeech
  File "/benchmarks/scripts/tf_cnn_benchmarks/models/experimental/deepspeech.py", line 121, in <module>
    class DeepSpeech2Model(model_lib.Model):
  File "/benchmarks/scripts/tf_cnn_benchmarks/models/experimental/deepspeech.py", line 126, in DeepSpeech2Model
    'lstm': tf.nn.rnn_cell.BasicLSTMCell,
  File "/usr/local/lib/python3.9/dist-packages/tensorflow/python/util/lazy_loader.py", line 207, in __getattr__
    raise AttributeError(
AttributeError: `BasicLSTMCell` is not available with Keras 3.

Operating System

Ubuntu 22.04.5 LTS

CPU

Intel(R) Xeon(R) Platinum 8462Y+

GPU

AMD Instinct MI300X (gfx942)

ROCm Version

ROCm 6.2.0

ls -l /opt/rocm
rocm/       rocm-6.2.0/

ROCm Component

ROCm

Steps to Reproduce

See Problem Description

(Optional for Linux users) Output of /opt/rocm/bin/rocminfo --support

From the runing pod:

kubectl exec -it alexnet-tf-gpu-pod -- bash

tf-docker ~ > rocminfo --support
ROCk module version 6.8.5 is loaded
=====================    
HSA System Attributes    
=====================    
Runtime Version:         1.14
Runtime Ext Version:     1.6
System Timestamp Freq.:  1000.000000MHz
Sig. Max Wait Duration:  18446744073709551615 (0xFFFFFFFFFFFFFFFF) (timestamp count)
Machine Model:           LARGE                              
System Endianness:       LITTLE                             
Mwaitx:                  DISABLED
DMAbuf Support:          YES

==========               
HSA Agents               
==========               
*******                  
Agent 1                  
*******                  
  Name:                    Intel(R) Xeon(R) Platinum 8462Y+   
  Uuid:                    CPU-XX                             
  Marketing Name:          Intel(R) Xeon(R) Platinum 8462Y+   
  Vendor Name:             CPU                                
  Feature:                 None specified                     
  Profile:                 FULL_PROFILE                       
  Float Round Mode:        NEAR                               
  Max Queue Number:        0(0x0)                             
  Queue Min Size:          0(0x0)                             
  Queue Max Size:          0(0x0)                             
  Queue Type:              MULTI                              
  Node:                    0                                  
  Device Type:             CPU                                
  Cache Info:              
    L1:                      49152(0xc000) KB                   
  Chip ID:                 0(0x0)                             
  ASIC Revision:           0(0x0)                             
  Cacheline Size:          64(0x40)                           
  Max Clock Freq. (MHz):   4100                               
  BDFID:                   0                                  
  Internal Node ID:        0                                  
  Compute Unit:            64                                 
  SIMDs per CU:            0                                  
  Shader Engines:          0                                  
  Shader Arrs. per Eng.:   0                                  
  WatchPts on Addr. Ranges:1                                  
  Memory Properties:       
  Features:                None
  Pool Info:               
    Pool 1                   
      Segment:                 GLOBAL; FLAGS: FINE GRAINED        
      Size:                    1056378596(0x3ef70ee4) KB          
      Allocatable:             TRUE                               
      Alloc Granule:           4KB                                
      Alloc Recommended Granule:4KB                                
      Alloc Alignment:         4KB                                
      Accessible by all:       TRUE                               
    Pool 2                   
      Segment:                 GLOBAL; FLAGS: KERNARG, FINE GRAINED
      Size:                    1056378596(0x3ef70ee4) KB          
      Allocatable:             TRUE                               
      Alloc Granule:           4KB                                
      Alloc Recommended Granule:4KB                                
      Alloc Alignment:         4KB                                
      Accessible by all:       TRUE                               
    Pool 3                   
      Segment:                 GLOBAL; FLAGS: COARSE GRAINED      
      Size:                    1056378596(0x3ef70ee4) KB          
      Allocatable:             TRUE                               
      Alloc Granule:           4KB                                
      Alloc Recommended Granule:4KB                                
      Alloc Alignment:         4KB                                
      Accessible by all:       TRUE                               
  ISA Info:                
*******                  
Agent 2                  
*******                  
  Name:                    Intel(R) Xeon(R) Platinum 8462Y+   
  Uuid:                    CPU-XX                             
  Marketing Name:          Intel(R) Xeon(R) Platinum 8462Y+   
  Vendor Name:             CPU                                
  Feature:                 None specified                     
  Profile:                 FULL_PROFILE                       
  Float Round Mode:        NEAR                               
  Max Queue Number:        0(0x0)                             
  Queue Min Size:          0(0x0)                             
  Queue Max Size:          0(0x0)                             
  Queue Type:              MULTI                              
  Node:                    1                                  
  Device Type:             CPU                                
  Cache Info:              
    L1:                      49152(0xc000) KB                   
  Chip ID:                 0(0x0)                             
  ASIC Revision:           0(0x0)                             
  Cacheline Size:          64(0x40)                           
  Max Clock Freq. (MHz):   4100                               
  BDFID:                   0                                  
  Internal Node ID:        1                                  
  Compute Unit:            64                                 
  SIMDs per CU:            0                                  
  Shader Engines:          0                                  
  Shader Arrs. per Eng.:   0                                  
  WatchPts on Addr. Ranges:1                                  
  Memory Properties:       
  Features:                None
  Pool Info:               
    Pool 1                   
      Segment:                 GLOBAL; FLAGS: FINE GRAINED        
      Size:                    1056890548(0x3efedeb4) KB          
      Allocatable:             TRUE                               
      Alloc Granule:           4KB                                
      Alloc Recommended Granule:4KB                                
      Alloc Alignment:         4KB                                
      Accessible by all:       TRUE                               
    Pool 2                   
      Segment:                 GLOBAL; FLAGS: KERNARG, FINE GRAINED
      Size:                    1056890548(0x3efedeb4) KB          
      Allocatable:             TRUE                               
      Alloc Granule:           4KB                                
      Alloc Recommended Granule:4KB                                
      Alloc Alignment:         4KB                                
      Accessible by all:       TRUE                               
    Pool 3                   
      Segment:                 GLOBAL; FLAGS: COARSE GRAINED      
      Size:                    1056890548(0x3efedeb4) KB          
      Allocatable:             TRUE                               
      Alloc Granule:           4KB                                
      Alloc Recommended Granule:4KB                                
      Alloc Alignment:         4KB                                
      Accessible by all:       TRUE                               
  ISA Info:                
*******                  
Agent 3                  
*******                  
  Name:                    gfx942                             
  Uuid:                    GPU-e438db49e30f0da9               
  Marketing Name:          AMD Instinct MI300X                
  Vendor Name:             AMD                                
  Feature:                 KERNEL_DISPATCH                    
  Profile:                 BASE_PROFILE                       
  Float Round Mode:        NEAR                               
  Max Queue Number:        128(0x80)                          
  Queue Min Size:          64(0x40)                           
  Queue Max Size:          131072(0x20000)                    
  Queue Type:              MULTI                              
  Node:                    2                                  
  Device Type:             GPU                                
  Cache Info:              
    L1:                      32(0x20) KB                        
    L2:                      4096(0x1000) KB                    
    L3:                      262144(0x40000) KB                 
  Chip ID:                 29857(0x74a1)                      
  ASIC Revision:           1(0x1)                             
  Cacheline Size:          64(0x40)                           
  Max Clock Freq. (MHz):   2100                               
  BDFID:                   40448                              
  Internal Node ID:        2                                  
  Compute Unit:            304                                
  SIMDs per CU:            4                                  
  Shader Engines:          32                                 
  Shader Arrs. per Eng.:   1                                  
  WatchPts on Addr. Ranges:4                                  
  Coherent Host Access:    FALSE                              
  Memory Properties:       
  Features:                KERNEL_DISPATCH 
  Fast F16 Operation:      TRUE                               
  Wavefront Size:          64(0x40)                           
  Workgroup Max Size:      1024(0x400)                        
  Workgroup Max Size per Dimension:
    x                        1024(0x400)                        
    y                        1024(0x400)                        
    z                        1024(0x400)                        
  Max Waves Per CU:        32(0x20)                           
  Max Work-item Per CU:    2048(0x800)                        
  Grid Max Size:           4294967295(0xffffffff)             
  Grid Max Size per Dimension:
    x                        4294967295(0xffffffff)             
    y                        4294967295(0xffffffff)             
    z                        4294967295(0xffffffff)             
  Max fbarriers/Workgrp:   32                                 
  Packet Processor uCode:: 150                                
  SDMA engine uCode::      19                                 
  IOMMU Support::          None                               
  Pool Info:               
    Pool 1                   
      Segment:                 GLOBAL; FLAGS: COARSE GRAINED      
      Size:                    201310208(0xbffc000) KB            
      Allocatable:             TRUE                               
      Alloc Granule:           4KB                                
      Alloc Recommended Granule:2048KB                             
      Alloc Alignment:         4KB                                
      Accessible by all:       FALSE                              
    Pool 2                   
      Segment:                 GLOBAL; FLAGS: EXTENDED FINE GRAINED
      Size:                    201310208(0xbffc000) KB            
      Allocatable:             TRUE                               
      Alloc Granule:           4KB                                
      Alloc Recommended Granule:2048KB                             
      Alloc Alignment:         4KB                                
      Accessible by all:       FALSE                              
    Pool 3                   
      Segment:                 GLOBAL; FLAGS: FINE GRAINED        
      Size:                    201310208(0xbffc000) KB            
      Allocatable:             TRUE                               
      Alloc Granule:           4KB                                
      Alloc Recommended Granule:2048KB                             
      Alloc Alignment:         4KB                                
      Accessible by all:       FALSE                              
    Pool 4                   
      Segment:                 GROUP                              
      Size:                    64(0x40) KB                        
      Allocatable:             FALSE                              
      Alloc Granule:           0KB                                
      Alloc Recommended Granule:0KB                                
      Alloc Alignment:         0KB                                
      Accessible by all:       FALSE                              
  ISA Info:                
    ISA 1                    
      Name:                    amdgcn-amd-amdhsa--gfx942:sramecc+:xnack-
      Machine Models:          HSA_MACHINE_MODEL_LARGE            
      Profiles:                HSA_PROFILE_BASE                   
      Default Rounding Mode:   NEAR                               
      Default Rounding Mode:   NEAR                               
      Fast f16:                TRUE                               
      Workgroup Max Size:      1024(0x400)                        
      Workgroup Max Size per Dimension:
        x                        1024(0x400)                        
        y                        1024(0x400)                        
        z                        1024(0x400)                        
      Grid Max Size:           4294967295(0xffffffff)             
      Grid Max Size per Dimension:
        x                        4294967295(0xffffffff)             
        y                        4294967295(0xffffffff)             
        z                        4294967295(0xffffffff)             
      FBarrier Max Size:       32                                 
*** Done ***             
tf-docker ~ >

Additional Information

On the host machine:

sudo lshw -c cpu | grep product
       product: Intel(R) Xeon(R) Platinum 8462Y+
       product: Intel(R) Xeon(R) Platinum 8462Y+
       product: 401xx Series QAT
       product: Intel Corporation
       product: 401xx Series QAT
       product: Intel Corporation

lspci -nn | grep "VGA\|Display"
04:00.0 VGA compatible controller [0300]: Matrox Electronics Systems Ltd. Integrated Matrox G200eW3 Graphics Controller [102b:0536] (rev 04)

sudo dmidecode -t BIOS | grep Version
        Version: 2.0.4

lsb_release -sd
Ubuntu 22.04.5 LTS

uname -a
Linux ENC1-CLS01-SVR05 6.8.0-40-generic #40~22.04.3-Ubuntu SMP PREEMPT_DYNAMIC Tue Jul 30 17:30:19 UTC 2 x86_64 x86_64 x86_64 GNU/Linux

dkms status | grep "amdgpu\|radeon"
amdgpu/6.8.5-2009582.22.04, 6.8.0-40-generic, x86_64: installed

lsmod | grep amdgpu
amdgpu              19615744  0
amddrm_ttm_helper      12288  1 amdgpu
amdttm                110592  2 amdgpu,amddrm_ttm_helper
amddrm_buddy           20480  1 amdgpu
amdxcp                 12288  1 amdgpu
amd_sched              61440  1 amdgpu
amdkcl                 36864  3 amd_sched,amdttm,amdgpu
drm_exec               12288  1 amdgpu
drm_suballoc_helper    20480  1 amdgpu
drm_display_helper    237568  1 amdgpu
video                  73728  2 dell_wmi,amdgpu
i2c_algo_bit           16384  2 mgag200,amdgpu

dkms status
amdgpu/6.8.5-2009582.22.04, 6.8.0-40-generic, x86_64: installed

The text was updated successfully, but these errors were encountered:

y2kenny · 2024-11-04T20:10:17Z

The example workload is pretty old. Updating it has been on my todo list.

bsctl · 2024-11-04T21:38:06Z

@y2kenny please let me know if we’re missing something in configuration

AlexHe99 · 2024-11-28T02:28:34Z

I also got failed for both examples of cpu and gpu. The examples need to be updated~!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Issue]: Getting errors with TensorFlow sample #78

[Issue]: Getting errors with TensorFlow sample #78

bsctl commented Nov 4, 2024 •

edited

Loading

y2kenny commented Nov 4, 2024

bsctl commented Nov 4, 2024

AlexHe99 commented Nov 28, 2024 •

edited

Loading

[Issue]: Getting errors with TensorFlow sample #78

[Issue]: Getting errors with TensorFlow sample #78

Comments

bsctl commented Nov 4, 2024 • edited Loading

Problem Description

Operating System

CPU

GPU

ROCm Version

ROCm Component

Steps to Reproduce

(Optional for Linux users) Output of /opt/rocm/bin/rocminfo --support

Additional Information

y2kenny commented Nov 4, 2024

bsctl commented Nov 4, 2024

AlexHe99 commented Nov 28, 2024 • edited Loading

bsctl commented Nov 4, 2024 •

edited

Loading

AlexHe99 commented Nov 28, 2024 •

edited

Loading