Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[IDEA] Generation of problem files with mixed-representation index sets #602

Closed
wants to merge 33 commits into from

Conversation

whart222
Copy link
Member

@whart222 whart222 commented Jul 4, 2018

Fixes #567 (partial).

Summary/Motivation:

We recently made a small change that makes the default representation of Sets in Pyomo rely on insertion order. This PR completes that activity by resolving the simple test failures outlined in #567. Specifically, these changes allow mixed-representation sets to be used as index sets. The result is that problem files can be generated with determinism=0, which does no sorting of index values.

However, these changes only work with CPython (3.6, 3.7) and PyPy. Starting with Python3.6, the CPython and PyPy Python implementations have deterministic key orderings in their dictionary representations. And as of Python 3.7 this property is a part of the Python language specification. This feature is exploited to provide deterministic file generation without sorting index values.

NOTE: This is a partial fix of #567, since it only applies to more recent versions of Python. However, there is not a clear motivation for extending this fix for older versions of Python. However, this PR is motivated by the fact that sorting is not done during file generation, and hence files are generated more quickly (especially for models with constraints that have large index sets).

NOTE: Sorted ordering of mixed-representation sets often works for Python 2.7 (using determinism=1), since that version allows for comparison of more data types.

NOTE: This PR does not simply use the sorted_robust() function to sort when generating problem files. The sorted_robust() function is significantly slower than sorted() with mixed-representation data. Thus using sorted_robust() with Python 3.x would make it appear that Python3 is slower than Python2, when in fact there are faster alternatives.

Changes proposed in this PR:

  • Changing iteration in Set objects to use the insertion order by default.
  • Adding tests that confirm that mixed-representation sorts can be solved.

Legal Acknowledgement

By contributing to this software project, I agree to the following terms and conditions for my contribution:

  1. I agree my contributions are submitted under the BSD license.
  2. I represent I am authorized to make the contributions and grant the license. If my employer has rights to intellectual property that includes these contributions, I represent that I have received permission to make contributions and grant the required license on behalf of that employer.

whart222 added 9 commits July 4, 2018 10:57
These tests verify that implicitly ordered sets work as expected,
including when the set members are not comparable.
Only sort data when the sorted order is set.
Also updating a baseline test where ordering
changed with the default set order
Instead of randomizing, this simple inserts
components in order
1. Updating files for small16 tests

2. Disabling GAMS/Baron tests by changing their testname

3. Updating small16 GAMS/Baron baselines
@whart222
Copy link
Member Author

whart222 commented Jul 4, 2018

@jsiirola @ghackebeil I'm marking this as WIP because there are known test failures in PySP and GDP. The tests pass for pyomo.repn and pyomo.solvers, so I'm cautiously optimistic that we can safely disable the sorting logic in the writers. However, I suspect that PySP and GDP are generating models nondeterministically (e.g. through reformulations), and hence we have baseline test failures.

NOTE: I've currently disabled the sorting logic. However, an alternative is to leave that logic in place, but make the default writer behavior to not sort. This would allow testing of components where models are generated nondeterministically. However, I think that we do ultimately want determinism throughout Pyomo.

@whart222
Copy link
Member Author

whart222 commented Jul 4, 2018

@jsiirola @ghackebeil Hold off on reviewing this and resolving issues. I did some more careful testing, and there's an issue that I haven't accounted for yet. While we can easily make sets ordered, it's harder to make sparse indexing of components ordered the same way without using OrderedDict (which has performance impliactions, I think).

@jsiirola
Copy link
Member

jsiirola commented Jul 4, 2018

FWIW, I would rather work this issue on the set-rewrite branch instead of continuing to backpatch the current implementation, because in my experience working there, we need some rather significant changes to the design of sets to cleanly support this.

whart222 added 3 commits July 4, 2018 13:56
The iterator now depends on the set data iterator, which
reflects the fixed ordering of the underlying set.
Updated the small16 problem to ensure that it could generate
differences when nondeterminism exists in the model.
Has an indexed constaint that is sparsely populated.
@whart222
Copy link
Member Author

whart222 commented Jul 4, 2018

@jsiirola @ghackebeil OK. I think I resolved the issue.

@whart222
Copy link
Member Author

whart222 commented Jul 4, 2018

@jsiirola I don't quite see that this is a backpatch. I think these changes here are self-sufficient. I understand that we're working on the same code, but I'm expecting that the changes in that branch will simply replace the logic in sets.py. The changes to other files in this branch are orthogonal to that change. Right?

@whart222
Copy link
Member Author

whart222 commented Jul 4, 2018

BTW, my last change didn't require the use of OrderedDict. It was merely a change to the iterator for Set objects.

@whart222
Copy link
Member Author

whart222 commented Jul 5, 2018

@jsiirola This last change may resolved the GDP tests. I'll try to confirm tomorrow.

@whart222
Copy link
Member Author

whart222 commented Jul 5, 2018

@jsiirola It looks like the GDP test failures are gone in Python 3.6, but there are some test failures in Python 3.5. This is likely due to nondeterminism in the use of dict objects. I'll look into this.

@whart222
Copy link
Member Author

whart222 commented Jul 5, 2018

@ghackebeil On Python 3.6, there remain some test failures for PySP under tests/convert. I'm having trouble replicating these on old-sisu. Let me know if you have any thoughts about what might be going on here.

@ghackebeil
Copy link
Member

ghackebeil commented Jul 5, 2018

They failed locally for me, and the baseline updates I applied made sense to me. If they were not failing on another 3.6 build, that might indicate that there is another source of non-determinism. Would you mind testing on that machine again with the new baselines and let me know if it fails?

whart222 added 2 commits July 5, 2018 21:20
Cannot tests small16 and small17 with file_determinism=3
@whart222
Copy link
Member Author

whart222 commented Jul 6, 2018

@ghackebeil I just reran the Travis tests, and there are 4 failures for PySP in Python 3.7/3.6, and 13 failures for Python 3.5/2.7. I can't replicate these tests on old-sisu. Do you know if I need to have a specific package installed to do that?

I could imagine that there are additional sources of nondeterminism in PySP, though pyomo.core/pyomo.repn tests seem stable. That's where the components are defined and the problem writers are defined.

@blnicho blnicho changed the title (WIP) Deterministic generation of problem files [WIP] Deterministic generation of problem files Jul 6, 2018
whart222 added 2 commits July 6, 2018 21:19
This should obviate testing baseline changes outside of
pyomo.repn.

The small16 and small17 tests are tested with determinism=0,
since that is the only mode suitable for models with mixed
representations.
@whart222
Copy link
Member Author

whart222 commented Jul 7, 2018

@ghackebeil I changed the logic to default to determinism=1 (sorting of indices). Hence, I reverted your baseline changes.

@ghackebeil
Copy link
Member

So are the changes in this PR now solely related to Set pprint baselines?

@whart222
Copy link
Member Author

whart222 commented Jul 8, 2018

@ghackebeil I do have tests that confirm that writers work in Python 3.6, 3.7 with determinism=0.

@whart222
Copy link
Member Author

whart222 commented Jul 8, 2018

@jsiirola @ghackebeil I think my latest commits will resolve the outstanding test failures in this PR. In recent changes, I've made determinism=1 again the default for writers. Thus, the current code will fail if a user uses an incomparable index set in Python3.x and tries to write a problem file.

As John has noted, we could use the sorted_robust() function in the code (in the Block class?) to resolve this issue. However, I did some simple testing that shows that sorted_robust() is 5-10x slower for sets that contain string and integer values:

from pyomo.core.base.misc import sorted_robust
import time
import random

random.seed(12938470)

core = []
for i in range(100000):
    core.append(random.random())

start = time.time()
ans = list(sorted(core))
print("Baseline %f" % (time.time()-start))

new1 = core + [1]
start = time.time()
ans = list(sorted(new1))
print("Baseline %f" % (time.time()-start))

new1 = core + [1]
start = time.time()
ans = list(sorted_robust(new1))
print("Baseline %f" % (time.time()-start))

new2 = core + ['1']
start = time.time()
ans = list(sorted_robust(new2))
print("Baseline %f" % (time.time()-start))

On my Mac, I get statistics like:

Baseline 0.031980
Baseline 0.032028
Baseline 0.034850
Baseline 0.289933

So, this shouldn't impact our test performance, and I would expect it wouldn't impact most users. But using sorted_robust() would create a situation where a user with an incomparable set sees a nontrivial slowdown when moving from Python 2.7 to 3.x. And that is despite the fact that sorting isn't necessary to generate a reproducible problem representation (which I think is really the main point of the determinism option).

So, I'm reluctant to make this change. Thoughts?

@codecov-io
Copy link

codecov-io commented Jul 8, 2018

Codecov Report

Merging #602 into master will decrease coverage by <.01%.
The diff coverage is 100%.

Impacted file tree graph

@@            Coverage Diff            @@
##           master    #602      +/-   ##
=========================================
- Coverage   67.21%   67.2%   -0.01%     
=========================================
  Files         392     392              
  Lines       62208   62215       +7     
=========================================
+ Hits        41812   41814       +2     
- Misses      20396   20401       +5
Impacted Files Coverage Δ
pyomo/core/base/sets.py 85.82% <100%> (-0.12%) ⬇️
pyomo/repn/plugins/ampl/ampl_.py 87.71% <0%> (-0.1%) ⬇️
pyomo/pysp/ph.py 61.14% <0%> (-0.1%) ⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 8be17fd...8ab6e17. Read the comment docs.

@whart222
Copy link
Member Author

@jsiirola @ghackebeil Can we finalize this PR?

@jsiirola
Copy link
Member

I assumed you were still working on it (it is marked [WIP]).

I also do not understand the point of this PR. You are adding a bunch of tests, but they seem to only apply to one version of python (IMHO, not good). You are also changing the behavior of Set's pprint() (apparently is only sorted for sorted sets, otherwise is insertion order, but you have two ways of handling insertion order?). Is there a reason you are not using the sorted_robust utility instead of sorted in that change?

Using sorted_robust() to sort all sets.  This method is
slow for mixed representations, but that doesn't matter when
writing diagnostic output.
This should never have been committed.
@whart222 whart222 changed the title [WIP] Deterministic generation of problem files Generation of problem files with mixed-representation index sets Sep 1, 2018
@whart222
Copy link
Member Author

whart222 commented Sep 1, 2018

@jsiirola @carldlaird I reworked the PR description to clarify the scope/intent of this change. Note that this PR does not change the default behavior in Pyomo, as was originally intended. That proved problematic (see #677).

@jsiirola
Copy link
Member

jsiirola commented Sep 5, 2018

I am not in favor of this PR, as my understanding is that it will potentially lead to Pyomo generating different solver input files on different versions of Python (before/after Python 3.6). Please correct me if I am mistaken.

FWIW, the design I am experimenting with in the set-rewrite branch could address sorting through a different approach: Sets would support two methods: .sorted() and .ordered(). The difference is the latter is guaranteed to be ordered (i.e., deterministic) but not necessarily sorted. This allows for the Set to implement efficient deterministic iteration, without requiring that iteration to be strictly sorted. This will allow things like the writers to avoid sorting in the bulk of cases. That in turn, lessens the sort penalty -- which I already don't think is very significant, as I have not seen it show up on profiles...

@jsiirola jsiirola changed the title Generation of problem files with mixed-representation index sets [IDEA] Generation of problem files with mixed-representation index sets May 7, 2019
@whart222
Copy link
Member Author

I updated #567 to document that we are closing this PR because of inactivity.

@whart222 whart222 closed this Mar 18, 2020
@whart222 whart222 deleted the issue_567 branch March 18, 2020 18:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Should Pyomo allow indexing sets whose members are not comparable?
6 participants