Skip to content

Commit

Permalink
Add a hack to allow us to reset the URL that the code connects to
Browse files Browse the repository at this point in the history
The database code has currently been optimized for a 1:1 mapping between the
server and the database, where the server needs to make multiple intensive
queries to the database, for example, to read trajectory information from the
database for multiple users at a time, for the post-processing.

This has led to design decisions in which we cache the database connection and
re-use it, to avoid overloading the server with multiple queries.

We are now getting some requests for federated data.

A concrete example is
e-mission/e-mission-eval-private-data@952c476
where we wanted to compare the characteristics of multiple datasets
(e-mission/e-mission-eval-private-data#28)

An upcoming example would be to "roll up" multiple dashboard deployments (e.g.
from individual cities) into a program level dashboard. We anticipate that
these will be low, volume, intermittent accesses to generate analyses and
metrics.

The long-term fix is probably to create a FederatedTimeseries (similar to the
AggregateTimeseries that merges data across users). But for now, we just
implement a hack to reset the connection and reconnect it to a different URL.
This means that we cannot access all databases in parallel, we will need to
access them serially. But for the current use case, that is sufficient since we
can concatenate all the data and work with it later.
  • Loading branch information
shankari committed Jul 24, 2021
1 parent 2dadf8d commit 7a75990
Showing 1 changed file with 34 additions and 0 deletions.
34 changes: 34 additions & 0 deletions emission/storage/timeseries/abstract_timeseries.py
Original file line number Diff line number Diff line change
Expand Up @@ -37,6 +37,40 @@ def get_uuid_list():
import emission.storage.timeseries.builtin_timeseries as bits
return bits.BuiltinTimeSeries.get_uuid_list()

# This is a HACK and is very poor practice.
# It imports two other modules and modifies them directly
# It encodes details of their internal structure
# This is BAD
# DO NOT use this as an example for your own code
# However, with the current code structure, I don't have much of a choice
# both the modules include module variables for greater efficiency, and you
# cannot modify a module variable from a function within the module - it will
# treat it as a local variable
# I remember seeing some examples of how to fix this before, but I can't
# find it now. So we import the modules and change the variables here
@staticmethod
def _reset_url(new_url):
"""
Used for federation, to allow us to connect to multiple databases from a
single client instance.
"""

from pymongo import MongoClient
import emission.core.get_database as edb

edb.url = new_url
print("Connecting to new URL "+edb.url+" resetting _current_db link")
edb._current_db = MongoClient(edb.url).Stage_database
print("After changing URL, connection is %s" % edb._current_db)

import emission.storage.timeseries.builtin_timeseries as bits
bits.ts_enum_map = {
EntryType.DATA_TYPE: edb.get_timeseries_db(),
EntryType.ANALYSIS_TYPE: edb.get_analysis_timeseries_db()
}
print("After resetting the timeseries connections, map is %s" % bits.ts_enum_map)


def find_entries(self, key_list=None, time_query=None, geo_query=None,
extra_query_list=None):
"""
Expand Down

0 comments on commit 7a75990

Please sign in to comment.