Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature/realtime graphs v3 #379

Open
wants to merge 34 commits into
base: main
Choose a base branch
from
Open

Feature/realtime graphs v3 #379

wants to merge 34 commits into from

Conversation

szvsw
Copy link
Collaborator

@szvsw szvsw commented Aug 14, 2022

No description provided.

@szvsw szvsw force-pushed the feature/realtime-graphs-v3 branch 2 times, most recently from eb00c57 to 4cbf8ba Compare August 16, 2022 01:48
@szvsw
Copy link
Collaborator Author

szvsw commented Aug 16, 2022

Okay, @samuelduchesne it has been a long road, BUT, I think this is ready!

The only test that is failing, is, once again, that tests/test_schedules.py::TestSchedule::test_plot, but I'm not sure why as the test succeeds locally, both when running it in isolation and when running the whole TestSchedule module...

Here's a summary of some of the major changes:

  • UmiBase._GRAPH: This tracks a complete of the all UmiObjects ever created and their relationships to each other. It is used further to provide performance gains in various other functions.
  • UmiBaseHelper and UmiBaseList these objects provide hooks for non-UmiBase objects to be able to update the realtime graphs when setting values.
  • umibase_property is a new decorator which can be used on fields which store an UmiBase object to automatically set up typechecking and graph relinking
  • _CREATED_OBJECTS moves back to UmiBase and is renamed to _CREATED_OBJECTS_BY_CLASS, and stores a dict with one entry for each class;
  • any class which inherits UmiBase is automatically given an entry in UmiBase._CREATED_OBJECTS_BY_CLASS via the UmiBase.__init_subclass__ magic method;
  • _CREATED_OBJECTS becomes an UmiBase instance method which looks up the list of objcts corresponding to the calling object's class;
  • UmiBase.all_objects_of_type is a similar class method which allows you to provide a string, object, or class and fetch all objects matching.
  • UmiBase.id is made immutable - the motivation here is because it is used as part of __hash__ for almost everything, and changing it messes with the Graph structure.
  • UmiBase.get_unique no longer sorts the self_CREATED_OBJECTS by unit_number, as by definition self._CREATED_OBJECTS should return a list sorted in that manner: this is because the objects should be added to that list in the same order that they are created, and that list order should never change (it only grows). Correct me if you think I'm wrong here, but this should be a major area to pick up performance.
  • each individual object can be replaced with self.replace_me_with(other) with significant performance improvements, since it no longer requires traversing the entire Graph.
  • unique_components performance improvements are much more pronounced on large libraries with lots of duplicated components
  • UmiTemplateLibrary now tracks a list of _CREATED_LIBRARIES to try to use more performant versions of functions like to_graph when only a single library exists.
  • UmiTemplateLibrary and UmiBase now have _clear_class_memory classmethod's which are used to wipe class memory.
  • UmiTemplateLibrary.open now will automatically change incoming ids if they already exist in _CREATED_OBJECTS, unless you explicitly provide the preserve_ids=True argument. This is so that different objects are forced to have different hashes.
  • If multiple objects with the same hash exist, errors can arise pretty easily in the graph.
  • more thorough tests for test_umi.py, including proper cleanup after each test.

I think there is still performance to be gained in a few places, particularly with determining optimal usage of nx.dfs_preorder_nodes(G, bt) vs nx.has_path(G, bt, target, node) vs parent_child_traversal

Todos / Further Exploration

  • Consider 100% requiring ids to be unique and/or hashing on unit_number.
  • Currently, swapping UmiBaseHelpers (e.g. replacing one MaterialLayer with another) correctly updates the graph, but replacing the UmiBase object inside of an UmiBaseHelper wrapper does not e.g. using the Material setter directly on a MaterialLayer.
  • Investigate how BuildingTemplates should be handled in UniqueComponents
  • Make tests for Graph Mutation, and in particular with UmiBaseHelper and UmiBaseList mutation
  • ParentTemplates can now use UmiBase.all_objects_of_type("BuildingTemplates") to use the global graph to find parent templates rather than traversing the local graphs.
  • Figure out why test_schedules.py::TestSchedule::test_plot is failing
  • create custom decorators for UmiBaseHelper/Lists
  • consider custom decorator for primitive properties
  • if all User properties are custom decorated, functions like mapping and to_dict etc can probably be abstracted to live in UmiBase, since the classes will define their own Schemas. This would also allow UmiTemplateLibrary to be self documenting
  • finish aliases for list mutating actions in UmiBaseList
  • decide if all UmiBase subclasses should auto add nodes to global graph at the end of init, or just those without children (those with children will naturally be added via the link action which happens when a child is set during init)
  • addition/multilib optimization
  • figure out if nx.dfs_preorder_nodes or nx.has_path is quicker for assembling the children of a building template
  • consider dropping local graph and just looking it up from the global graph (would save some overhead on relinking)

@szvsw
Copy link
Collaborator Author

szvsw commented Aug 16, 2022

Here are some benchmark results for the following benchmark, run in main and in this branch:

Component Replacement Benchmarking

The idea is to just set a bunch of stuff to a single component, and then test to see how long it takes to replace that component with another. Because the original algorithm requires traversing every template completely, as the number of templates increases, performance decreases. The graph based algorithm explicitly knows where each component is used, so this step is avoided altogether.

@pytest.mark.parametrize("execution_count", range(10))
@pytest.mark.parametrize("additional_building_templates_count", [0, 10, 50])
def test_benchmark_replace_component(self, execution_count, additional_building_templates_count, benchmark_results):
lib = UmiTemplateLibrary.open("tests/input_data/umi_samples/BostonTemplateLibrary_2.json")
# extend the list of building templates with an arbitrary building template, even if its a dupe
for i in range(additional_building_templates_count):
lib.BuildingTemplates.append(lib.BuildingTemplates[-1])
# Set all first layers to the first opaque material
for construction in lib.OpaqueConstructions:
construction.Layers[0] = lib.OpaqueMaterials[0]
@timeit
def run():
# Replace the first opaque material with the second opaque material
lib.replace_component(lib.OpaqueMaterials[0], lib.OpaqueMaterials[1])
start = time.time()
run()
end = time.time()
try:
benchmark_results["replace_component"][additional_building_templates_count].append(end-start)
except KeyError:
benchmark_results["replace_component"][additional_building_templates_count] = []
benchmark_results["replace_component"][additional_building_templates_count].append(end-start)
for count, array in benchmark_results["replace_component"].items():
log(f"Average component replacement time for {count} extra templates: { np.average(array) } (stddev: {np.std(array)})")

Trials Extra Building Templates in Lib (baseline: 4 BTs) Original Replacement Time: Avg (StdDev) (s) Graph Replacement Time: Avg (StdDev) (s)
10 0 0.052 (0.005) 0.014 (0.002)
10 10 0.180 (0.019) 0.025 (0.002)
10 50 0.592 (0.026) 0.062 (0.004)

As you can see, knowing the parents of each component significantly improves the performance,

Unique Components Benchmarking

For unique components, the original method should also get worse as the number of templates increases. However, it will also get significantly worse as the number of objects in the _CREATED_OBJECTS store increases, primarily due to the sorting in get_unique being performed over and over and over again I believe. This test demonstrates that - the objects do not even exist in the library, just the store, but they severely affect the get_unique call in the original algorithm.

def test_benchmark_unique_components(self, execution_count, phantom_objects_count, benchmark_results):
lib = UmiTemplateLibrary.open("tests/input_data/umi_samples/BostonTemplateLibrary_2.json")
# Add a bunch of phantom stuff to the library, e.g. objects in other templates
for i in range(phantom_objects_count):
lib.ZoneLoads[1].duplicate()
lib.DomesticHotWaterSettings[1].duplicate()
lib.ZoneConditionings[1].duplicate()
lib.VentilationSettings[1].duplicate()
lib.ZoneConstructionSets[1].duplicate()
for obj in lib.ZoneDefinitions:
obj.Loads = lib.ZoneLoads[0].duplicate()
obj.DomesticHotWater = lib.DomesticHotWaterSettings[0].duplicate()
obj.Conditioning = lib.ZoneConditionings[0].duplicate()
obj.Ventilation = lib.VentilationSettings[0].duplicate()
obj.Constructions = lib.ZoneConstructionSets[0].duplicate()
@timeit
def run():
lib.unique_components()
start = time.time()
run()
end = time.time()
try:
benchmark_results["unique_components"][phantom_objects_count].append(end-start)
except KeyError:
benchmark_results["unique_components"][phantom_objects_count] = []
benchmark_results["unique_components"][phantom_objects_count].append(end-start)
for count, array in benchmark_results["unique_components"].items():
log(f"Average component replacement time for {count} extra components: { np.average(array) } (stddev: {np.std(array)})")
assert len(lib.ZoneLoads) == 1
assert len(lib.DomesticHotWaterSettings) == 1
assert len(lib.ZoneConditionings) == 1
assert len(lib.VentilationSettings) == 1
assert len(lib.ZoneConstructionSets) == 1

Trials Extra Components in _CREATED_OBJECTS (Baseline: ~200 comps) Original Unique Components Time: Avg (StdDev) (s) Graph Unique Components Time: Avg (StdDev) (s)
10 0 0.576 (0.056) 0.271 (0.017)
10 50 0.631 (0.022) 0.302 (0.037)
10 500 1.500 (0.043) 0.355 (0.028)
10 1000 2.500 (0.049) 0.407 (0.043)

If the extra components are also added to the library (as opposed to just being added to the store, as shown above), each component would incur the full penalty of calling get_unique.

Additionally, the more objects there are that require replacement - either because one object is being used in many places, or there are many equivalent objects, each object will benefit from the component replacement optimization.

Discussion

A lot of the gains of unique components can probably be acquired from simply dropping sorted from get_unique, leveraging the assumption that the (a) the order of each list of _CREATED_OBJECTS has a stable order and (b) it should already be sorted by the order the objects were created in. Still, the BuildingTemplate count will negatively impact the original algorithm but not the graph based one. There are some other small gains here and there within it as well, even with the extra overhead of mutating the graph every time you replace a component.

Another simple optimization is to require that ids are unique (the graph based branch implements this) and to bail out of component replacement and return self if get_unique returned an object with obj.id == self.id (maybe better to do this with unit_number?).

Reproduction

If you pull the this branch and run pytest tests/test_umi.py::TestUmiTemplate you should see the benchmarking results in the live log calls. If you want to see the results for the current algorithm, checkout this branch

@samuelduchesne samuelduchesne force-pushed the feature/realtime-graphs-v3 branch from 8d28bba to 3f89938 Compare October 19, 2022 18:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant