Feature/realtime graphs v3 #379

szvsw · 2022-08-14T21:27:42Z

No description provided.

szvsw · 2022-08-16T03:49:58Z

Okay, @samuelduchesne it has been a long road, BUT, I think this is ready!

The only test that is failing, is, once again, that tests/test_schedules.py::TestSchedule::test_plot, but I'm not sure why as the test succeeds locally, both when running it in isolation and when running the whole TestSchedule module...

Here's a summary of some of the major changes:

UmiBase._GRAPH: This tracks a complete of the all UmiObjects ever created and their relationships to each other. It is used further to provide performance gains in various other functions.
UmiBaseHelper and UmiBaseList these objects provide hooks for non-UmiBase objects to be able to update the realtime graphs when setting values.
umibase_property is a new decorator which can be used on fields which store an UmiBase object to automatically set up typechecking and graph relinking
_CREATED_OBJECTS moves back to UmiBase and is renamed to _CREATED_OBJECTS_BY_CLASS, and stores a dict with one entry for each class;
any class which inherits UmiBase is automatically given an entry in UmiBase._CREATED_OBJECTS_BY_CLASS via the UmiBase.__init_subclass__ magic method;
_CREATED_OBJECTS becomes an UmiBase instance method which looks up the list of objcts corresponding to the calling object's class;
UmiBase.all_objects_of_type is a similar class method which allows you to provide a string, object, or class and fetch all objects matching.
UmiBase.id is made immutable - the motivation here is because it is used as part of __hash__ for almost everything, and changing it messes with the Graph structure.
UmiBase.get_unique no longer sorts the self_CREATED_OBJECTS by unit_number, as by definition self._CREATED_OBJECTS should return a list sorted in that manner: this is because the objects should be added to that list in the same order that they are created, and that list order should never change (it only grows). Correct me if you think I'm wrong here, but this should be a major area to pick up performance.
each individual object can be replaced with self.replace_me_with(other) with significant performance improvements, since it no longer requires traversing the entire Graph.
unique_components performance improvements are much more pronounced on large libraries with lots of duplicated components
UmiTemplateLibrary now tracks a list of _CREATED_LIBRARIES to try to use more performant versions of functions like to_graph when only a single library exists.
UmiTemplateLibrary and UmiBase now have _clear_class_memory classmethod's which are used to wipe class memory.
UmiTemplateLibrary.open now will automatically change incoming ids if they already exist in _CREATED_OBJECTS, unless you explicitly provide the preserve_ids=True argument. This is so that different objects are forced to have different hashes.
If multiple objects with the same hash exist, errors can arise pretty easily in the graph.
more thorough tests for test_umi.py, including proper cleanup after each test.

I think there is still performance to be gained in a few places, particularly with determining optimal usage of nx.dfs_preorder_nodes(G, bt) vs nx.has_path(G, bt, target, node) vs parent_child_traversal

Todos / Further Exploration

Consider 100% requiring ids to be unique and/or hashing on unit_number.
Currently, swapping UmiBaseHelpers (e.g. replacing one MaterialLayer with another) correctly updates the graph, but replacing the UmiBase object inside of an UmiBaseHelper wrapper does not e.g. using the Material setter directly on a MaterialLayer.
Investigate how BuildingTemplates should be handled in UniqueComponents
Make tests for Graph Mutation, and in particular with UmiBaseHelper and UmiBaseList mutation
ParentTemplates can now use UmiBase.all_objects_of_type("BuildingTemplates") to use the global graph to find parent templates rather than traversing the local graphs.
Figure out why test_schedules.py::TestSchedule::test_plot is failing
create custom decorators for UmiBaseHelper/Lists
consider custom decorator for primitive properties
if all User properties are custom decorated, functions like mapping and to_dict etc can probably be abstracted to live in UmiBase, since the classes will define their own Schemas. This would also allow UmiTemplateLibrary to be self documenting
finish aliases for list mutating actions in UmiBaseList
decide if all UmiBase subclasses should auto add nodes to global graph at the end of init, or just those without children (those with children will naturally be added via the link action which happens when a child is set during init)
addition/multilib optimization
figure out if nx.dfs_preorder_nodes or nx.has_path is quicker for assembling the children of a building template
consider dropping local graph and just looking it up from the global graph (would save some overhead on relinking)

szvsw · 2022-08-16T07:31:05Z

Here are some benchmark results for the following benchmark, run in main and in this branch:

Component Replacement Benchmarking

The idea is to just set a bunch of stuff to a single component, and then test to see how long it takes to replace that component with another. Because the original algorithm requires traversing every template completely, as the number of templates increases, performance decreases. The graph based algorithm explicitly knows where each component is used, so this step is avoided altogether.

archetypal/tests/test_umi.py

Lines 261 to 285 in e943130

    
           @pytest.mark.parametrize("execution_count", range(10)) 
        
           @pytest.mark.parametrize("additional_building_templates_count", [0, 10, 50]) 
        
           def test_benchmark_replace_component(self, execution_count, additional_building_templates_count, benchmark_results): 
        
               lib = UmiTemplateLibrary.open("tests/input_data/umi_samples/BostonTemplateLibrary_2.json") 
        
               # extend the list of building templates with an arbitrary building template, even if its a dupe 
        
               for i in range(additional_building_templates_count): 
        
                   lib.BuildingTemplates.append(lib.BuildingTemplates[-1]) 
        
               # Set all first layers to the first opaque material 
        
               for construction in lib.OpaqueConstructions: 
        
                   construction.Layers[0] = lib.OpaqueMaterials[0] 
        
               @timeit 
        
               def run(): 
        
                   # Replace the first opaque material with the second opaque material 
        
                   lib.replace_component(lib.OpaqueMaterials[0], lib.OpaqueMaterials[1]) 
        
               start = time.time() 
        
               run() 
        
               end = time.time() 
        
               try: 
        
                   benchmark_results["replace_component"][additional_building_templates_count].append(end-start) 
        
               except KeyError: 
        
                   benchmark_results["replace_component"][additional_building_templates_count] = [] 
        
                   benchmark_results["replace_component"][additional_building_templates_count].append(end-start) 
        
               for count, array in benchmark_results["replace_component"].items(): 
        
                   log(f"Average component replacement time for {count} extra templates: { np.average(array) } (stddev: {np.std(array)})")

Trials	Extra Building Templates in Lib (baseline: 4 BTs)	Original Replacement Time: Avg (StdDev) (s)	Graph Replacement Time: Avg (StdDev) (s)
10	0	0.052 (0.005)	0.014 (0.002)
10	10	0.180 (0.019)	0.025 (0.002)
10	50	0.592 (0.026)	0.062 (0.004)

As you can see, knowing the parents of each component significantly improves the performance,

Unique Components Benchmarking

For unique components, the original method should also get worse as the number of templates increases. However, it will also get significantly worse as the number of objects in the _CREATED_OBJECTS store increases, primarily due to the sorting in get_unique being performed over and over and over again I believe. This test demonstrates that - the objects do not even exist in the library, just the store, but they severely affect the get_unique call in the original algorithm.

archetypal/tests/test_umi.py

Lines 291 to 323 in e943130

    
           def test_benchmark_unique_components(self, execution_count, phantom_objects_count, benchmark_results): 
        
               lib = UmiTemplateLibrary.open("tests/input_data/umi_samples/BostonTemplateLibrary_2.json") 
        
               # Add a bunch of phantom stuff to the library, e.g. objects in other templates 
        
               for i in range(phantom_objects_count): 
        
                   lib.ZoneLoads[1].duplicate() 
        
                   lib.DomesticHotWaterSettings[1].duplicate() 
        
                   lib.ZoneConditionings[1].duplicate() 
        
                   lib.VentilationSettings[1].duplicate() 
        
                   lib.ZoneConstructionSets[1].duplicate() 
        
               for obj in lib.ZoneDefinitions: 
        
                   obj.Loads = lib.ZoneLoads[0].duplicate() 
        
                   obj.DomesticHotWater = lib.DomesticHotWaterSettings[0].duplicate() 
        
                   obj.Conditioning = lib.ZoneConditionings[0].duplicate() 
        
                   obj.Ventilation = lib.VentilationSettings[0].duplicate() 
        
                   obj.Constructions = lib.ZoneConstructionSets[0].duplicate() 
        
               @timeit 
        
               def run(): 
        
                   lib.unique_components() 
        
               start = time.time() 
        
               run() 
        
               end = time.time() 
        
               try: 
        
                   benchmark_results["unique_components"][phantom_objects_count].append(end-start) 
        
               except KeyError: 
        
                   benchmark_results["unique_components"][phantom_objects_count] = [] 
        
                   benchmark_results["unique_components"][phantom_objects_count].append(end-start) 
        
               for count, array in benchmark_results["unique_components"].items(): 
        
                   log(f"Average component replacement time for {count} extra components: { np.average(array) } (stddev: {np.std(array)})") 
        
               assert len(lib.ZoneLoads) == 1 
        
               assert len(lib.DomesticHotWaterSettings) == 1 
        
               assert len(lib.ZoneConditionings) == 1 
        
               assert len(lib.VentilationSettings) == 1 
        
               assert len(lib.ZoneConstructionSets) == 1

Trials	Extra Components in _CREATED_OBJECTS (Baseline: ~200 comps)	Original Unique Components Time: Avg (StdDev) (s)	Graph Unique Components Time: Avg (StdDev) (s)
10	0	0.576 (0.056)	0.271 (0.017)
10	50	0.631 (0.022)	0.302 (0.037)
10	500	1.500 (0.043)	0.355 (0.028)
10	1000	2.500 (0.049)	0.407 (0.043)

If the extra components are also added to the library (as opposed to just being added to the store, as shown above), each component would incur the full penalty of calling get_unique.

Additionally, the more objects there are that require replacement - either because one object is being used in many places, or there are many equivalent objects, each object will benefit from the component replacement optimization.

Discussion

A lot of the gains of unique components can probably be acquired from simply dropping sorted from get_unique, leveraging the assumption that the (a) the order of each list of _CREATED_OBJECTS has a stable order and (b) it should already be sorted by the order the objects were created in. Still, the BuildingTemplate count will negatively impact the original algorithm but not the graph based one. There are some other small gains here and there within it as well, even with the extra overhead of mutating the graph every time you replace a component.

Another simple optimization is to require that ids are unique (the graph based branch implements this) and to bail out of component replacement and return self if get_unique returned an object with obj.id == self.id (maybe better to do this with unit_number?).

Reproduction

If you pull the this branch and run pytest tests/test_umi.py::TestUmiTemplate you should see the benchmarking results in the live log calls. If you want to see the results for the current algorithm, checkout this branch

add cleanup to fixtures fix unique_addition create test_change_ids move created_objects to umibase and improve performance improve id handling for combine to accomodate non-mutable ids fix validate for zoneconstructionset fix fixture names fix fixture names

szvsw force-pushed the feature/realtime-graphs-v3 branch 2 times, most recently from eb00c57 to 4cbf8ba Compare August 16, 2022 01:48

szvsw added 26 commits October 19, 2022 14:53

add setters for simple umibase fields

7081d54

enable linking

b519530

add parent template lookup based off of graph

50f0aed

implement umibaselist and unique_comps

233ca9f

fix UmiBaseList equality

6a0594b

fix unlinking for template_addition

7f8d51e

add parent templates test

42e456d

allow multiple umibaselists per umibase parent

776ab1d

initialize umibase properties after basic properties

bed6237

update graph method and fix iadd test fail

bd7011b

change ventilation initialization order

f9424b1

use predecessor to determine parents

fe53cea

fail silently if edge does not exist

15ae6df

prevent validation error when creating ventilation

f10bb7e

dont use pred in Parents

e4a030c

create replace component test

795551a

use fixture for template lib in test_umi

804c7e5

create global graph

956bde4

add cleanup to fixtures fix unique_addition create test_change_ids move created_objects to umibase and improve performance improve id handling for combine to accomodate non-mutable ids fix validate for zoneconstructionset fix fixture names fix fixture names

create cleanup fixture

871a08c

rewrite bt window loader for clarity

6d1a357

make graph testing more rigorous

dc90457

enable logging

3ce74a3

improve unique_components performance

d1ff2d7

add cleanup pre testing

dd5d80d

add timeit to replace

91508c9

remove timeits

3fa85e1

szvsw added 8 commits October 19, 2022 14:53

disable timeit

500d9e9

skip replacing self

80db08a

add benchmark tests which skip by default

1cbb68a

rewrite benchmarks

7f7b20c

add extra items to benchmark for unique_comp

e4c17cc

parametrize phantom object count

b4a5152

add benchmark logging

ed090ae

skip benchmarks on ci

3f89938

samuelduchesne force-pushed the feature/realtime-graphs-v3 branch from 8d28bba to 3f89938 Compare October 19, 2022 18:54

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feature/realtime graphs v3 #379

Feature/realtime graphs v3 #379

szvsw commented Aug 14, 2022

szvsw commented Aug 16, 2022 •

edited

Loading

szvsw commented Aug 16, 2022 •

edited

Loading

Feature/realtime graphs v3 #379

Are you sure you want to change the base?

Feature/realtime graphs v3 #379

Conversation

szvsw commented Aug 14, 2022

szvsw commented Aug 16, 2022 • edited Loading

Todos / Further Exploration

szvsw commented Aug 16, 2022 • edited Loading

Component Replacement Benchmarking

Unique Components Benchmarking

Discussion

Reproduction

szvsw commented Aug 16, 2022 •

edited

Loading

szvsw commented Aug 16, 2022 •

edited

Loading