-
Notifications
You must be signed in to change notification settings - Fork 30
Performance
2 possible architectures:
- Store
council_id
andpolling_station_id
as additional columns on the AddressBase model and pre-populate the council column as part of the bootstrapping process. - Create a join/link table
uprn
,council_id
,polling_station_id
.
I never got as far as trying option 2 - it adds a big join to every front-end query, but might make the build/import process simpler? Option 1 (denormalised approach) is probably more efficient for end-users. Denormalising it is a bit gnarly though:
If you naively run
cursor.execute(
"UPDATE %s AS t1 SET council=t2.lad FROM %s AS t2 WHERE t1.uprn=t2.uprn;" %\
(address_table_name, onsud_table_name)
)
it takes 4.5 hours. If you disable the indexes, run that and re-enable them then you can reduce it to 2 but this is problematic. Django doesn't create indexes with consistent names so you can't drop and re-create them in raw SQL. You can do it like this (proof of concept):
from django.db import connections, migrations, models, transaction
from django.db.migrations.state import ProjectState
from django.apps import apps
from django.core.management.base import BaseCommand
class DisableKeys(migrations.Migration):
dependencies = []
operations = [
migrations.AlterField(
model_name='address',
name='postcode',
field=models.CharField(max_length=15, blank=True),
),
migrations.AlterField(
model_name='address',
name='council',
field=models.CharField(null=True, blank=True, max_length=100),
),
]
class EnableKeys(migrations.Migration):
dependencies = []
operations = [
migrations.AlterField(
model_name='address',
name='postcode',
field=models.CharField(max_length=15, db_index=True, blank=True),
),
migrations.AlterField(
model_name='address',
name='council',
field=models.ForeignKey(blank=True, null=True, to='councils.Council', related_name='+'),
),
]
class Command(BaseCommand):
def handle(self, *args, **kwargs):
disable = DisableKeys('foo', 'pollingstations')
connection = connections['default']
state = ProjectState.from_apps(apps)
newstate = disable.apply(state, connection.schema_editor())
print("disabled")
print("enabling..")
enable = EnableKeys('foo', 'pollingstations')
enable.apply(newstate, connection.schema_editor())
you could probably wrap it in a transaction or something to clean it up, but it is always going to be brittle.
N.B: After doing that monstrous JOIN
you end up with some UPRNs with no council attached. You need to do those with a ST_COVERS(area, location)
lookup.. then you end up with a handful of edge cases left over:
- CROMER PAVILION THEATRE, THE PIER, PROMENADE, CROMER
- C & A THOMASON LTD, CULLIVOE PIER, CULLIVOE, YELL, SHETLAND
- CLARKS BOATWORKS LTD, CROWN WHARF 1-5, CASTLETOWN, PORTLAND
- 3 ALBERT QUAY, FOWEY
- WORTHING PIER EVENTS LTD, WORTHING PIER, MARINE PARADE, WORTHING
- THE BOAT THAT ROCKS, 7 HAMM BEACH ROAD, PORTLAND
- THE PHAROS TRUST, LIGHT VESSEL 18, THE QUAY, HARWICH
- JOLLY ROGER, THE PIER, CLACTON-ON-SEA
- CROMER LIFEBOAT STATION, PROMENADE, CROMER
- THE CROFT HOUSE, GARTHS OF HAM, BRESSAY, SHETLAND
- THE CLACTON PIER CO LTD, THE PIER, CLACTON-ON-SEA
- DEAN & REDDYHOFF LTD, 6 HAMM BEACH ROAD, PORTLAND
- HALFPENNY PIER VISITORS CENTRE, THE QUAY, HARWICH
- PIER POINT, ARDMOR ROAD, AROS, ISLE OF MULL
- RANGE SAFETY CRAFT, 6 HAMM BEACH ROAD, PORTLAND
- THE PIER HEAD, CRAIGNURE, ISLE OF MULL
- 1 MULBERRY QUAY, MARKET STRAND, FALMOUTH
- STENA LINE, FISHGUARD HARBOUR, GOODWICK
SELECT *
FROM pollingstations_address a
JOIN councils_council c
ON ST_Covers(c.area, a.location)
WHERE a.council_id IS NULL;
SELECT * FROM pollingstations_address
WHERE council_id IS NULL AND uprn NOT IN
(SELECT uprn
FROM pollingstations_address a
JOIN councils_council c
ON ST_Covers(c.area, a.location)
WHERE a.council_id IS NULL);