-
Notifications
You must be signed in to change notification settings - Fork 113
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Adjust for JPype1 v1.4.1 #463
Comments
There was no intended change of behavior in JPype. But we had to change mechanisms. If the new mechanism is triggering a referencing problem, please try to replicate. |
Totally understood, and thank you for the comment. As the description says, we can try to condense a MWE from our somewhat complicated code base—only, no bandwidth to do this right now. This issue is just to put a pin in the matter until we manage to find time. |
Can you give me a rough description of your test? For example, something like this:
(which is incidentally one of the patterns that I check). Most important details....
We have built in leak checkers for objects, strings, and primitives (brute force create a lot and make sure they go away), but that can miss structural defects in which the order, timing, or connections of an object causes an issue. |
With the release of JPype 1.5.0, this has become relevant again since all our workflows on Pyhon < 3.11 are breaking. This is because the temporary workaround specifies |
TL;DR: this has gotten and taken much longer than anticipated. For my money, the issue can likely be explained by understanding the changes you made for Python 3.11. Jpype1 1.5.0 (and 1.4.1, for that matter) work like expected for Python 3.11, it's only failing on older versions on our side. The first failing test is located in ixmp/tests/backend/test_base.py: def test_del_ts(self, test_mp):
"""Test CachingBackend.del_ts()."""
# Since CachingBackend is an abstract class, test it via JDBCBackend
backend = test_mp._backend
cache_size_pre = len(backend._cache)
# Load data, thereby adding to the cache
s = make_dantzig(test_mp)
s.par("d")
# Cache size has increased
assert cache_size_pre + 1 == len(backend._cache)
# Delete the object; associated cache is freed
del s
# Objects were invalidated/removed from cache
assert cache_size_pre == len(backend._cache)
i j value unit
seattle new-york 2.5 km
seattle chicago 1.7 km
seattle topeka 1.8 km
san-diego new-york 2.5 km
san-diego chicago 1.8 km
san-diego topeka 1.4 km (Just noticing: these distances are supposed to represent thousands of miles rather than km, so our units are wrong here, but that hardly matters for the error, I think.) def gc(cls):
if _GC_AGGRESSIVE:
# log.debug('Collect garbage')
try:
java.System.gc()
except jpype.JVMNotRunning:
pass
gc.collect() Lastly, the overwriting of the On the java side, I can't tell you much about what's happening, I'm afraid. When the try:
self.jobj = java.Platform("Python", properties)
except java.NoClassDefFoundError as e: # pragma: no cover
raise NameError(
f"{e}\nCheck that dependencies of ixmp.jar are "
f"included in {Path(__file__).parents[2] / 'lib'}"
)
except jpype.JException as e: # pragma: no cover
# Handle Java exceptions
jclass = e.__class__.__name__
if jclass.endswith("HikariPool.PoolInitializationException"):
redacted = copy(kwargs)
redacted.update(user="(HIDDEN)", password="(HIDDEN)")
msg = f"unable to connect to database:\n{repr(redacted)}"
elif jclass.endswith("FlywayException"):
msg = "when initializing database:"
if "applied migration" in e.args[0]:
msg += (
"\n\nThe schema of the database does not match the schema of "
"this version of ixmp. To resolve, either install the version "
"of ixmp used to create the database, or delete it and retry."
)
else:
_raise_jexception(e)
raise RuntimeError(f"{msg}\n(Java: {jclass})") , so it is checking for some exceptions. The next step, then, finally gets deep into java territory: # type="par", name="d"
def item_get_elements(self, s, type, name, filters=None): # noqa: C901
if filters:
# Convert filter elements to strings
filters = {dim: as_str_list(ele) for dim, ele in filters.items()}
try:
# Retrieve the cached value with this exact set of filters
return self.cache_get(s, type, name, filters)
except KeyError:
pass # Cache miss
try:
# Retrieve a cached, unfiltered value of the same item
unfiltered = self.cache_get(s, type, name, None)
except KeyError:
pass # Cache miss
else:
# Success; filter and return
return filtered(unfiltered, filters)
# Failed to load item from cache
# Retrieve the item
item = self._get_item(s, type, name, load=True)
idx_names = list(item.getIdxNames())
idx_sets = list(item.getIdxSets())
# Get list of elements, using filters if provided
if filters is not None:
jFilter = java.HashMap()
for idx_name, values in filters.items():
# Retrieve the elements of the index set as a list
idx_set = idx_sets[idx_names.index(idx_name)]
elements = self.item_get_elements(s, "set", idx_set).tolist()
# Filter for only included values and store
filtered_elements = filter(lambda e: e in values, elements)
jFilter.put(idx_name, to_jlist(filtered_elements))
jList = item.getElements(jFilter)
else:
jList = item.getElements()
if item.getDim() > 0:
# Mapping set or multi-dimensional equation, parameter, or variable
columns = copy(idx_names)
# Prepare dtypes for index columns
dtypes = {}
for idx_name, idx_set in zip(columns, idx_sets):
# NB using categoricals could be more memory-efficient, but requires
# adjustment of tests/documentation. See
# https://github.com/iiasa/ixmp/issues/228
# dtypes[idx_name] = CategoricalDtype(
# self.item_get_elements(s, 'set', idx_set))
dtypes[idx_name] = str
# Prepare dtypes for additional columns
if type == "par":
columns.extend(["value", "unit"])
dtypes.update(value=float, unit=str)
# Same as above
# dtypes['unit'] = CategoricalDtype(self.jobj.getUnitList())
elif type in ("equ", "var"):
columns.extend(["lvl", "mrg"])
dtypes.update(lvl=float, mrg=float)
# Copy vectors from Java into pd.Series to form DataFrame columns
columns = []
def _get(method, name, *args):
columns.append(
pd.Series(
# NB [:] causes JPype to use a faster code path
getattr(item, f"get{method}")(*args, jList)[:],
dtype=dtypes[name],
name=name,
)
)
# Index columns
for i, idx_name in enumerate(idx_names):
_get("Col", idx_name, i)
# Data columns
if type == "par":
_get("Values", "value")
_get("Units", "unit")
elif type in ("equ", "var"):
_get("Levels", "lvl")
_get("Marginals", "mrg")
result = pd.concat(columns, axis=1, copy=False)
elif type == "set":
# Index sets
# dtype=object is to silence a warning in pandas 1.0
result = pd.Series(item.getCol(0, jList)[:], dtype=object)
elif type == "par":
# Scalar parameter
result = dict(
value=float(item.getScalarValue().floatValue()),
unit=str(item.getScalarUnit()),
)
elif type in ("equ", "var"):
# Scalar equation or variable
result = dict(
lvl=float(item.getScalarLevel().floatValue()),
mrg=float(item.getScalarMarginal().floatValue()),
)
# Store cache
self.cache(s, type, name, filters, result)
return result From the ixmp java source code (which is private for some reason, I believe), I would say that |
@glatterf42 thanks for digging in—I hope that was instructive! The fault probably lies in code that I originally added in #213, several years ago. The basic purpose of this code is to aggressively free up references to Python objects (like Scenario, TimeSeries, etc.) so that they and the associated ixmp_source Java objects (via JPype) can be garbage-collected, freeing memory. This was deemed necessary for larger applications, like running many instances of the MESSAGEix-GLOBIOM global model in older workflows (using the scenario_runner/runscript_main pattern). I will open a PR to remove the workaround for the current issue and adjust/update the tests and/or caching/memory-management code. Then we can discuss whatever the solution turns out to be. |
I spent some further time investigating this, including using refcycle. After ixmp/ixmp/tests/backend/test_base.py Lines 148 to 149 in d92cfcd
…I inserted code like: import gc
gc.collect()
gc.collect()
gc.collect()
import sys
import traceback
import refcycle
import ixmp
snapshot = refcycle.snapshot()
scenarios = snapshot.find_by(lambda obj: isinstance(obj, ixmp.Scenario))
s2 = scenarios[0]
print(f"{s2 = }")
snapshot.ancestors(s2).export_image("refcycle-main.svg")
tbs = list(
snapshot.ancestors(s2).find_by(
lambda obj: type(obj).__name__ == "traceback"
)
)
print(f"{tbs = }")
del snapshot
gc.collect()
for i, obj in enumerate(tbs):
print(f"{i} {obj}")
print(f"{obj.tb_frame = }")
traceback.print_tb(obj, file=sys.stdout) This produced output like the following: In short:
|
With the release of JPype1 version 1.4.1, CI jobs have started to fail, e.g. here, with messages like:
del s
line is reached, these persisting references / non-zero refcount prevent the object itself, and associated cached item values, from being garbage-collected.To mitigate
To fix
The text was updated successfully, but these errors were encountered: