-
Notifications
You must be signed in to change notification settings - Fork 11
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Check CUDA out of memory #213
base: main
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Awesome! Great that you got into tiktorch so quickly!
I left some minor comments and a question about how we want to see the interface for this functionality (basically ints vs namedints)...
hey @k-dominik :) I have attempted to refactor the design of the The previous design of the With this interface clients also can tranform the data from the grpc channel to a feature-rich object again with Do you think that the design is on the right track? |
tiktorch/server/session/process.py
Outdated
self._min_shape = min_shape | ||
self._steps = steps | ||
assert self._min_shape.is_same_axes(self._steps) | ||
assert all(step == 0 for axis, step in steps if axis not in AxisWithValue.SPATIAL_AXES) # todo: ? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not sure if this is valid? Does it make sense to allow non zero step values for non spatial axes increments?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For the refactor of ModelInfo
; In general I like the idea of having something more usable directly :), good idea!
With the implementation details for the "rich" nicer to use objects I left comments that mostly leaned towards subclassing dict
, which are probably outdated as I'm thinking.... actually, what do you think about investigating if bioimageio.spec
classes can be reused for this purpose. I think, at least for me, any removed layer of transformation would reduce cognitive load. On the other hand this would bind us to bioimageio.spec
, at least for the metadata. But since that's the format we're supporting (and the only one) I don't see much of a problem.
Of course then I immediately ask myself why even do the dance and not parse the model on the client also, and be done with it (and we already do parse the spec on the client(ilastik) side.)...
Also the ModelInfo
(and related convenience classes) would need a very heavy refactor when updating to 0.5 of the spec.
idk, what do you think @thodkatz?
Since we removed ModelInfo interface, the proto buff for InputShape and OutputShape, and their conversions are redundant
Two procedures have been added: - Get the maximum tensor shape - Check if a tensor's shape fits to memory
The current interface supports multiple device ids. To check if a cuda memory request is a valid one, meaning that a gpu is detected, a device id is needed to do the check for the available ones if any.
0c1cf06
to
46ac782
Compare
Since the PR of removing the ModelInfo was merged, I have rebased this PR.
|
def get_axes_with_size(axes: Tuple[str, ...], shape: Tuple[int, ...]) -> NamedShape: | ||
if len(axes) != len(shape): | ||
raise ValueError(f"{axes} and {shape} incompatible length. It should be equal") | ||
InputTensorValidator.is_shape(shape) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oops, this check should actually do something
|
||
def _check_shape_explicit(self, spec: nodes.InputTensor, tensor_shape: NamedShape): | ||
assert self.is_shape_explicit(spec) | ||
reference_shape = {name: size for name, size in zip(spec.axes, spec.shape)} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we could use the get_axes_with_size
|
||
def _check_shape_parameterized(self, spec: nodes.InputTensor, tensor_shape: NamedShape): | ||
assert isinstance(spec.shape, ParametrizedInputShape) | ||
if not self.is_shape(tensor_shape.values()): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if this is part of the get_axes_with_size
that return NamedShape
, where we assume a valid named shape, meaning a map of the names of the axes, along with their size (natural number), then we can remove this check from here, and integrate it into the get_axes_with_size
. Maybe the Dict[str, int]
can be an actual class, not a type, to enforce this. I should maybe have kept the AxesWithValue
Now that I think about it, we can improve the design: What we actually need is just a validated xr.DataArray aka tensors, and not really plain shapes. The xr.DataArray can be convenient since it contains already the logic of The xr.DataArray will handle as well all the logic for potentially negative values in the shape, and some checks can be skipped. The reasoning of needing valid tensors, can be understood by checking the implementations of the proceduers |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Having this functionality is really great! I can already see us using gpus more efficiently!
In general it would be great to add docstrings to methods, especially if they are some sort of API. Take IsCudaOutOfMemory
as an example. Given the name I'd assume that one can query if the current device is out of memory. But is looks like this is intended to check whether a given tensor with a given shape would fit.
Also at least somewhere the scope of what tiktorch supports in terms of models should be noted. When I read your code I think you assume single input tensors (the out of memory functionality should also add any additional tensors the model expects, with default sizes I guess, otherwise there will be errors thrown), do you think that limitation would be hard to lift?
Currently the out of memory tests will also not run on mac due to the model having pytorch state-dict weights (there is a bug currently, that prevents this from working). An easy workaround would be switching to torchscript. I think in current bioimage.spec/core this bug was fixed, but we're not there yet :). (Maybe we should add CI on a different platform, too... not in this PR ;))
I also noticed that there aren't any tests for InputTensorValidator
- which would be great to have for completeness.
return client.api.forward(sample) | ||
|
||
def _check_gpu_exists(self, client: BioModelClient, device_id: str): | ||
gpu_device_ids = [device.id for device in self.__device_pool.list_devices() if device.id.startswith("cuda")] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
devices could also be "mps"
(on apple silicon).
if len(gpu_device_ids) == 0: | ||
raise ValueError("Not available gpus found") | ||
if device_id not in client.devices: | ||
raise ValueError(f"{device_id} not found for model {client.name}") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess one additional check would be if device_id in gpu_device_ids
?
max_shape_arr = np.array(list(max_shape.values())) | ||
min_shape_arr = np.array(list(param_shape.min_shape.values())) | ||
step_arr = np.array(list(param_shape.step_shape.values())) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm always afraid to rely two dicts having the same order... The prior check_same_axes
is invariant to order of the keys. I'd probably do something like this with a fixed reference for the order:
max_shape_arr = np.array(list(max_shape.values())) | |
min_shape_arr = np.array(list(param_shape.min_shape.values())) | |
step_arr = np.array(list(param_shape.step_shape.values())) | |
max_shape_arr = np.array([max_shape[k] for k in max_shape]) | |
min_shape_arr = np.array([param_shape.min_shape[k] for k in max_shape]) | |
step_arr = np.array([param_shape.step_shape[k] for k in max_shape]) |
but maybe that's overdoing it - spec probably guarantees the order?
return max_shape | ||
return None | ||
|
||
def _is_cuda_out_of_memory(self, client: BioModelClient, tensor_id: str, shape: NamedShape) -> bool: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This method seems to assume, that there will be only a single Tensor in the sample... While ilastik can only deal with one, tiktorch probably could deal with multiple inputs.
MAX_SHAPE = (1, 1, 10, 10) | ||
AXES = ("b", "c", "y", "x") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe note that these values relate to the model being used - otherwise it's a bit hard to follow how MAX_SHAPE
is enforced (i was first looking for some monkeypatching somewhere).
Great! Thank you very much @k-dominik for the review once more :) Totally agree with the statements, I will attempt to resolve them. Could you please also have a look on this comment #213 (comment), I think that a little bit of refactoring will remove some redundant checks, and it will be more readable as well :) |
Regarding this, actually without knowing it, I have attempted to fix the weight conversion functionality of the bioimage core in this PR! But I think that this should indeed be handled by the core, and tiktorch shouldn't know about the weights format. But yeah let's keep the CI on another PR :)
Yep you are right! I think it is time for the concept of |
It's a bit of a pity to reason about this still with the old spec in mind. In the new one as I understood Fynn one would go by reference axes... And I suppose yes, a (hyper-)grid would be the correct solution. But for some reason that sounds too complicated as a result. Poof, this needs more thinking... |
In conjunction with ilastik/ilastik#2891, eventually our goal would be to find the maximum possible tensor shape that can fit to the GPU memory for a simple forward pass.
To accomplish this, tiktorch should provide clients the functionality of doing the memory prompting to the GPU.
Two procedures have been added:
IsCudaOutOfMemory(shape) -> bool
, checks for a given shape if it fits to the GPU memoryMaxCudaMemoryShape(minShape, maxShape, step) -> shape
, for a range of [minShape, maxShape] and valid increments of step, returns the maximum shape that fits to the GPU memory.Exposing the first one, could be still valuable if we have a good guess of a shape, since the second one can be very time consuming, depending on the range.
The implementation of prompting the GPU has been inspired as well by plant-seg.
Note for the review:
The
inference_pb2_grpc.py
andinference_pb2.py
are auto-generated files bymake protos
.