You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Came across an issue where the transfer size is larger than 16MB. So far this only happens when the DPU code is run on upmemcloud4.
To recreate:
Compile with 1 tasklet on the upmemcloud4 machine and run the decoder with -d option with any image. ./host-1 -d ./data/fox0.jpg
Error recieved:
Got 2546 dpus across 40 ranks (63 dpus per rank)
Warning: fewer input files than DPUs (1 < 2546)
api:W:2022-02-17|18:03:39: dpu_push_xfer_symbol: invalid symbol access (offset:0 + length:231996736 > size:16777216)
src/jpeg-host.c:85(scale_rank): DPU Error (invalid memory symbol access)
Reasoning:
In the following loop, the last dpu has a garbage value stored in its in_length variable. So the final length is set to this value which causes the transfer to fail. (Copied from )
This is because this for loop goes through DPUs [0,63] inclusive, whereas the value of in_length is only set for DPUs [0,62]. In other words, only 63 DPUs are allocated per rank, however, the DPU_FOREACH loop goes through all 64 DPUs in a rank. Here is the Code where length is set.
The text was updated successfully, but these errors were encountered:
This might be an SDK error as I don't observe the same behavior on the upmemcloud5 even though not all 64 DPUs/rank are allocated. The way I tested this is that I printed the dpu_id in DPU_FOREACH(dpu_rank, dpu, dpu_id) loop. This value was between [0,63] and [0,60] on upmemcloud4 and upmemcloud5, respectively.
Came across an issue where the transfer size is larger than 16MB. So far this only happens when the DPU code is run on upmemcloud4.
To recreate:
Compile with 1 tasklet on the upmemcloud4 machine and run the decoder with -d option with any image.
./host-1 -d ./data/fox0.jpg
Error recieved:
Reasoning:
In the following loop, the last dpu has a garbage value stored in its in_length variable. So the final length is set to this value which causes the transfer to fail. (Copied from )
This is because this for loop goes through DPUs [0,63] inclusive, whereas the value of in_length is only set for DPUs [0,62]. In other words, only 63 DPUs are allocated per rank, however, the DPU_FOREACH loop goes through all 64 DPUs in a rank. Here is the Code where length is set.
The text was updated successfully, but these errors were encountered: