Skip to content
chris48s edited this page Nov 24, 2017 · 2 revisions

UPRNs as a first-class Citizen in WhereDIV: 3 - Performance

2 possible architectures:

  1. Store council_id and polling_station_id as additional columns on the AddressBase model and pre-populate the council column as part of the bootstrapping process.
  2. Create a join/link table uprn, council_id, polling_station_id.

I never got as far as trying option 2 - it adds a big join to every front-end query, but might make the build/import process simpler? Option 1 (denormalised approach) is probably more efficient for end-users. Denormalising it is a bit gnarly though:

If you naively run

cursor.execute(
    "UPDATE %s AS t1 SET council=t2.lad FROM %s AS t2 WHERE t1.uprn=t2.uprn;" %\
    (address_table_name, onsud_table_name)
)

it takes 4.5 hours. If you disable the indexes, run that and re-enable them then you can reduce it to 2 but this is problematic. Django doesn't create indexes with consistent names so you can't drop and re-create them in raw SQL. You can do it like this (proof of concept):

from django.db import connections, migrations, models, transaction
from django.db.migrations.state import ProjectState
from django.apps import apps
from django.core.management.base import BaseCommand


class DisableKeys(migrations.Migration):

    dependencies = []

    operations = [
        migrations.AlterField(
            model_name='address',
            name='postcode',
            field=models.CharField(max_length=15, blank=True),
        ),
        migrations.AlterField(
            model_name='address',
            name='council',
            field=models.CharField(null=True, blank=True, max_length=100),
        ),
    ]


class EnableKeys(migrations.Migration):

    dependencies = []

    operations = [
        migrations.AlterField(
            model_name='address',
            name='postcode',
            field=models.CharField(max_length=15, db_index=True, blank=True),
        ),
        migrations.AlterField(
            model_name='address',
            name='council',
            field=models.ForeignKey(blank=True, null=True, to='councils.Council', related_name='+'),
        ),
    ]


class Command(BaseCommand):

    def handle(self, *args, **kwargs):

        disable = DisableKeys('foo', 'pollingstations')
        connection = connections['default']
        state = ProjectState.from_apps(apps)
        newstate = disable.apply(state, connection.schema_editor())
        print("disabled")

        print("enabling..")
        enable = EnableKeys('foo', 'pollingstations')
        enable.apply(newstate, connection.schema_editor())

you could probably wrap it in a transaction or something to clean it up, but it is always going to be brittle.

N.B: After doing that monstrous JOIN you end up with some UPRNs with no council attached. You need to do those with a ST_COVERS(area, location) lookup.. then you end up with a handful of edge cases left over:

  • CROMER PAVILION THEATRE, THE PIER, PROMENADE, CROMER
  • C & A THOMASON LTD, CULLIVOE PIER, CULLIVOE, YELL, SHETLAND
  • CLARKS BOATWORKS LTD, CROWN WHARF 1-5, CASTLETOWN, PORTLAND
  • 3 ALBERT QUAY, FOWEY
  • WORTHING PIER EVENTS LTD, WORTHING PIER, MARINE PARADE, WORTHING
  • THE BOAT THAT ROCKS, 7 HAMM BEACH ROAD, PORTLAND
  • THE PHAROS TRUST, LIGHT VESSEL 18, THE QUAY, HARWICH
  • JOLLY ROGER, THE PIER, CLACTON-ON-SEA
  • CROMER LIFEBOAT STATION, PROMENADE, CROMER
  • THE CROFT HOUSE, GARTHS OF HAM, BRESSAY, SHETLAND
  • THE CLACTON PIER CO LTD, THE PIER, CLACTON-ON-SEA
  • DEAN & REDDYHOFF LTD, 6 HAMM BEACH ROAD, PORTLAND
  • HALFPENNY PIER VISITORS CENTRE, THE QUAY, HARWICH
  • PIER POINT, ARDMOR ROAD, AROS, ISLE OF MULL
  • RANGE SAFETY CRAFT, 6 HAMM BEACH ROAD, PORTLAND
  • THE PIER HEAD, CRAIGNURE, ISLE OF MULL
  • 1 MULBERRY QUAY, MARKET STRAND, FALMOUTH
  • STENA LINE, FISHGUARD HARBOUR, GOODWICK
SELECT *
FROM pollingstations_address a
JOIN councils_council c
ON ST_Covers(c.area, a.location)
WHERE a.council_id IS NULL;
SELECT * FROM pollingstations_address
WHERE council_id IS NULL AND uprn NOT IN
    (SELECT uprn
    FROM pollingstations_address a
    JOIN councils_council c
    ON ST_Covers(c.area, a.location)
    WHERE a.council_id IS NULL);