-
Notifications
You must be signed in to change notification settings - Fork 8
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Bug] Conflict with matplotlib #47
Comments
Wow. Thanks for reporting. I can imagine that was super frustrating to debug. Does this problem only occur in docker? When I run it on my mac, I don't have a problem: I looked through my code where I've used matplotlib and helics in the same process and found that I've imported matplotlib before and after helics, and never had a problem with this. I've never tried this in a Docker environment though. I'm currently not able to run it (because of some weird permissions issues on my end, I'll have to try on my other computer). I'll experiment and report back. FYI @phlptp @nightlark |
That exception comes from argument validation when processing the federate config. One place to look would be if the json that gets created is different in those two cases, specifically on the broker_port field |
I wonder if |
Thanks for the quick responses! The JSON strings appear to be identical in both cases: Imported before: {"name": "test_federate", "core_type": "zmq", "federates": 1, "broker_port": 23456, "broker_address": "172.22.0.2"} Imported after: {"name": "test_federate", "core_type": "zmq", "federates": 1, "broker_port": 23456, "broker_address": "172.22.0.2"} And shuffling the order around to get the socket and create the JSON first before the other imports still results in the exception if I've also just checked this with the python I know
Running the |
Well here's a bit of a work-around at least: "broker_port": "23456", Setting the I still have no idea what could possibly be interfering with the gridlabd code which parses the port number. |
We are able to reproduce the segfault with matplotlib and helics, and @nightlark and @phlptp have ideas for how to resolve it. |
So what we've found is that on certain systems (Quartz, which is RHEL 7, and uses Python executables compiled by gcc 4.9.3) the order of loading shared libraries matters; it doesn't seem to be particularly dependent on the Python version. Minimal test case that results in a segfault:
In addition to the shared library from the kiwisolver dependency, the shared libraries included in matplotlib All instances result in an error message along the lines of After testing with a copy of the HELICS shared library that had all the extra symbols hidden, the same segfault occurs -- so the underlying cause is still a mystery; as is why the same thing doesn't happen with more of the libraries included in matplotlib. |
Narrowed down the problem to When loading matplotlib first, the system copy of libstdc++ and libgcc_s get loaded so the functions they import get used and nothing breaks; when helics is loaded first matplotlib libraries try to use parts of the static linked copies of those libraries, which conflict with the system libraries. This also explains why the crash doesn't happen on all systems -- some systems have a copy of libstdc++ and libgcc_s that is compatible with the statically linked version in HELICS (system copy is same or newer version than included in HELICS). I have some ideas for how to fix this, but I'm not sure if it should be done as part of building pyhelics wheels or the HELICS release binaries yet (or both) -- it might depend on whether the segfault in matHELICS on Linux has the same underlying cause. |
I've discovered a weird inexplicable conflict between
helics
andmatplotlib
.The bug is if
matplotlib
is imported beforehelics
thenhelics.helicsCreateCombinationFederateFromConfig()
raises an exception when trying to connect to a remote broker (in a separate docker container).The exception I get with the latest versions of helics is:
Here's a gist with a a minimal
docker-compose.yml
,Dockerfile
image and a simplereproduction.py
script which triggers this.https://gist.github.com/devanubis/ec452950c317d3684016dd3b609ebca3
This can be executed with just
docker-compose up
.I've found the following:
import matplotlib
to afterimport helics
does not result in the same exception being triggeredbroker_address
but with thebroker_port
I don't know enough about either library to dig much deeper.
I can also report this to https://github.com/matplotlib/matplotlib if you want, but thought I'd start with pyhelics since the exception is from here.
Fortunately I've been able to remove
matplotlib
from our affected code, so this isn't urgent for us, but it was painful to track down what was causing this weirdness.The text was updated successfully, but these errors were encountered: