failed to decode the batch #5

hoverflow · 2023-10-11T16:00:35Z

Hi, When I use server-parallel I get an error: updateSlots : failed to decode the batch, n_batch = 1, ret = 1

this is the complete log before the error:
llm_load_tensors: using CUDA for GPU acceleration
llm_load_tensors: mem required = 107.54 MB
llm_load_tensors: offloading 40 repeating layers to GPU
llm_load_tensors: offloading non-repeating layers to GPU
llm_load_tensors: offloaded 43/43 layers to GPU
llm_load_tensors: VRAM used: 8694.21 MB
...................................................................................................
llama_new_context_with_model: n_ctx = 1024
llama_new_context_with_model: freq_base = 10000.0
llama_new_context_with_model: freq_scale = 1
llama_kv_cache_init: offloading v cache to GPU
llama_kv_cache_init: offloading k cache to GPU
llama_kv_cache_init: VRAM kv self = 800.00 MB
llama_new_context_with_model: kv self size = 800.00 MB
llama_new_context_with_model: compute buffer total size = 118.13 MB
llama_new_context_with_model: VRAM scratch buffer: 112.00 MB
llama_new_context_with_model: total VRAM used: 9606.21 MB (model: 8694.21 MB, context: 912.00 MB)
Available slots:

slot 0
slot 1

llama server listening at http://0.0.0.0:8080

system prompt updated
slot 0 is processing
slot 0 released
slot 0 is processing
slot 0 released
slot 0 is processing
slot 0 released
slot 0 is processing
slot 0 released
slot 0 is processing
slot 1 is processing
updateSlots : failed to decode the batch, n_batch = 1, ret = 1

I run server-parallel with the following command:
./server-parallel -m models/xyz.gguf --ctx_size 2048 -t 4 -ngl 40 --host 0.0.0.0 --batch-size 512 --parallel 2

Of course this only happens if both slots are performing inference at the same time. Could you please help me resolve this issue?

FSSRepo · 2023-10-11T17:20:38Z

That can only happen if the input prompt is too long. If you instead provide a video demonstrating the behavior, please remember that the 2048 context is shared between the two sequences.

Edit:

Use the branch fixes for apply latest changes. master is outdated.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

failed to decode the batch #5

failed to decode the batch #5

hoverflow commented Oct 11, 2023

FSSRepo commented Oct 11, 2023 •

edited

Loading

failed to decode the batch #5

failed to decode the batch #5

Comments

hoverflow commented Oct 11, 2023

FSSRepo commented Oct 11, 2023 • edited Loading

FSSRepo commented Oct 11, 2023 •

edited

Loading