From 7a759905d5a4703cdd27fee14c766e21b06fc85c Mon Sep 17 00:00:00 2001 From: Shankari Date: Fri, 23 Jul 2021 18:21:33 -0700 Subject: [PATCH] Add a hack to allow us to reset the URL that the code connects to The database code has currently been optimized for a 1:1 mapping between the server and the database, where the server needs to make multiple intensive queries to the database, for example, to read trajectory information from the database for multiple users at a time, for the post-processing. This has led to design decisions in which we cache the database connection and re-use it, to avoid overloading the server with multiple queries. We are now getting some requests for federated data. A concrete example is https://github.com/e-mission/e-mission-eval-private-data/pull/28/commits/952c476dec745eb66477fed8d699f1bab85d84c0 where we wanted to compare the characteristics of multiple datasets (https://github.com/e-mission/e-mission-eval-private-data/pull/28) An upcoming example would be to "roll up" multiple dashboard deployments (e.g. from individual cities) into a program level dashboard. We anticipate that these will be low, volume, intermittent accesses to generate analyses and metrics. The long-term fix is probably to create a FederatedTimeseries (similar to the AggregateTimeseries that merges data across users). But for now, we just implement a hack to reset the connection and reconnect it to a different URL. This means that we cannot access all databases in parallel, we will need to access them serially. But for the current use case, that is sufficient since we can concatenate all the data and work with it later. --- .../storage/timeseries/abstract_timeseries.py | 34 +++++++++++++++++++ 1 file changed, 34 insertions(+) diff --git a/emission/storage/timeseries/abstract_timeseries.py b/emission/storage/timeseries/abstract_timeseries.py index 115efb7e3..e2902b4bf 100644 --- a/emission/storage/timeseries/abstract_timeseries.py +++ b/emission/storage/timeseries/abstract_timeseries.py @@ -37,6 +37,40 @@ def get_uuid_list(): import emission.storage.timeseries.builtin_timeseries as bits return bits.BuiltinTimeSeries.get_uuid_list() + # This is a HACK and is very poor practice. + # It imports two other modules and modifies them directly + # It encodes details of their internal structure + # This is BAD + # DO NOT use this as an example for your own code + # However, with the current code structure, I don't have much of a choice + # both the modules include module variables for greater efficiency, and you + # cannot modify a module variable from a function within the module - it will + # treat it as a local variable + # I remember seeing some examples of how to fix this before, but I can't + # find it now. So we import the modules and change the variables here + @staticmethod + def _reset_url(new_url): + """ + Used for federation, to allow us to connect to multiple databases from a + single client instance. + """ + + from pymongo import MongoClient + import emission.core.get_database as edb + + edb.url = new_url + print("Connecting to new URL "+edb.url+" resetting _current_db link") + edb._current_db = MongoClient(edb.url).Stage_database + print("After changing URL, connection is %s" % edb._current_db) + + import emission.storage.timeseries.builtin_timeseries as bits + bits.ts_enum_map = { + EntryType.DATA_TYPE: edb.get_timeseries_db(), + EntryType.ANALYSIS_TYPE: edb.get_analysis_timeseries_db() + } + print("After resetting the timeseries connections, map is %s" % bits.ts_enum_map) + + def find_entries(self, key_list=None, time_query=None, geo_query=None, extra_query_list=None): """