Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bulk transfer size too large #9

Open
niloo9876 opened this issue Feb 17, 2022 · 2 comments
Open

Bulk transfer size too large #9

niloo9876 opened this issue Feb 17, 2022 · 2 comments

Comments

@niloo9876
Copy link
Collaborator

Came across an issue where the transfer size is larger than 16MB. So far this only happens when the DPU code is run on upmemcloud4.

To recreate:
Compile with 1 tasklet on the upmemcloud4 machine and run the decoder with -d option with any image.
./host-1 -d ./data/fox0.jpg

Error recieved:

Got 2546 dpus across 40 ranks (63 dpus per rank)
Warning: fewer input files than DPUs (1 < 2546)
api:W:2022-02-17|18:03:39: dpu_push_xfer_symbol: invalid symbol access (offset:0 + length:231996736 > size:16777216)
src/jpeg-host.c:85(scale_rank): DPU Error (invalid memory symbol access)

Reasoning:
In the following loop, the last dpu has a garbage value stored in its in_length variable. So the final length is set to this value which causes the transfer to fail. (Copied from )

	DPU_FOREACH(dpu_rank, dpu, dpu_id)
	{
		DPU_ASSERT(dpu_prepare_xfer(dpu, (void *) input[dpu_id].in_buffer));

		if (input[dpu_id].in_length > longest_length)
			longest_length = input[dpu_id].in_length;
	}
	DPU_ASSERT(dpu_push_xfer(dpu_rank, DPU_XFER_TO_DPU, "file_buffer", 0, ALIGN(longest_length, 8), DPU_XFER_DEFAULT));

This is because this for loop goes through DPUs [0,63] inclusive, whereas the value of in_length is only set for DPUs [0,62]. In other words, only 63 DPUs are allocated per rank, however, the DPU_FOREACH loop goes through all 64 DPUs in a rank. Here is the Code where length is set.

@niloo9876
Copy link
Collaborator Author

This might be an SDK error as I don't observe the same behavior on the upmemcloud5 even though not all 64 DPUs/rank are allocated. The way I tested this is that I printed the dpu_id in DPU_FOREACH(dpu_rank, dpu, dpu_id) loop. This value was between [0,63] and [0,60] on upmemcloud4 and upmemcloud5, respectively.

@niloo9876
Copy link
Collaborator Author

clang version 12.0.0 (https://github.com/upmem/llvm-project.git d36425841d9a4d1420b7aa155675f6ae8bcf9f08)
Target: dpu-upmem-dpurte
Thread model: posix

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant