Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Python wrappers that return protobuf serialized strings sometimes fail when decoded as utf-8 #62

Open
Aposhian opened this issue Jan 16, 2025 · 0 comments

Comments

@Aposhian
Copy link

When a C++ function returns a std::string that is a serialized protobuf (like in callService), and that is part of a Pybind11 wrapper, Pybind11 assumes it is UTF-8 and will try to decode it as such.

When a C++ function returns a std::string or char* to a Python caller, pybind11 will assume that the string is valid UTF-8 and will decode it to a native Python str, using the same API as Python uses to perform bytes.decode('utf-8'). If this implicit conversion fails, pybind11 will raise a UnicodeDecodeError.

https://pybind11.readthedocs.io/en/stable/advanced/cast/strings.html#returning-c-strings-to-python

This is not a valid assumption for protobuf, and so sometimes an exception is raised when returning the service response, and so you might see an exception like:

'utf-8' codec can't decode byte in position 14 invalid start byte '0x3'

The fix is to instead have the python wrappers return py::bytes which will be directly returned as a bytes object in Python without any attempt to decode.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant