Skip to content

Commit

Permalink
Add docs around performing the sync
Browse files Browse the repository at this point in the history
  • Loading branch information
tomtaylor committed Aug 30, 2021
1 parent 3627452 commit d170108
Show file tree
Hide file tree
Showing 2 changed files with 66 additions and 1 deletion.
37 changes: 36 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -13,7 +13,6 @@ It will ignore the coordinates from Northern Irish postcodes (starting with BT)
## Requirements

- Golang 1.17
- 6GB RAM (including running the UK admin data in a PIP server)

## Example

Expand All @@ -23,6 +22,42 @@ Build the binary with a simple `make`. Then:
./bin/wof-sync-os-postcodes -wof-postalcodes-path ../whosonfirst-data-postalcode-gb/data -ons-csv-path ../ons/ONSPD_MAY_2019_UK/Data/ONSPD_MAY_2019_UK.csv -ons-date 2019-05-01
```

## Performing the sync

The `whosonfirst-data-postalcode-gb` repo has a large number of small files, and performing the actual sync and subsequent git operations against the repo is fairly painful.

I suggest using a 64GB machine with a ram disk. The ramdisk provides tolerable IO performance, and brings time to perform a fresh sync down to few hours. In my experience 32GB isn't enough and you will experience out-of-memory crashes, losing all your progress, as the RAM disk will not persist.

`setup.sh` contains a script which performs much of the set up for you. It expects to be run in an empty, ephemeral VM, so if you're running on a machine you care about, please read the script carefully before executing.

After executing, copy the ONS directory CSV into `/mnt/wof`. To bring up the PIP server, open tmux or your favourite screen-like app, and in one window execute:

```shell
cd /mnt/wof
./wof-pip-server whosonfirst-data-admin-gb/data/
```

And in another window perform the sync with something like:

```shell
/mnt/wof
./wof-sync-os-postcodes -wof-postalcodes-path whosonfirst-data-postalcode-gb/data/ -ons-csv-path ONSPD_AUG_2021_UK.csv -ons-date 2021-08-01
```

Now find something else to do for a few hours.

Assuming you're on an ephemeral VM, you will need to set your Git name and email before you commit your changes:

```shell
git config --global user.name "Foo Bar"
git config --global user.email "[email protected]"
```

Some tips:

* Perform the `git push` over HTTPS, as SSH connections to Github seem to drop while the repo is being prepared for push
* Disable Git garbage collection on the repo as this will probably kick in at some point and you will scream (`setup.sh` does this for you)

## See also

- https://github.com/whosonfirst-data/whosonfirst-data-postalcode-gb
Expand Down
30 changes: 30 additions & 0 deletions setup.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,30 @@
#!/bin/bash
set -eo pipefail

MOUNT_DIR="/mnt/wof"
WOF_SYNC_OS_POSTCODES_VERSION="0.1.0"

sudo mkdir -p "$MOUNT_DIR"
sudo mount -t tmpfs -o size=90%,nr_inodes=0 wof "$MOUNT_DIR"
sudo chown $(whoami):$(whoami) "$MOUNT_DIR"

# Add stuff for building, and other useful utils
sudo apt install -y build-essential git golang tmux unzip jq

cd "$MOUNT_DIR"

git clone --depth 1 https://github.com/whosonfirst-data/whosonfirst-data-admin-gb
git clone https://github.com/whosonfirst-data/whosonfirst-data-postalcode-gb
# Disable GC because it really hurts the commit performance if it kicks in, and this checkout is not long lived
cd whosonfirst-data-postalcode-gb
git config gc.auto 0
cd ..

curl -L "https://github.com/whosonfirst/wof-sync-os-postcodes/releases/download/v${WOF_SYNC_OS_POSTCODES_VERSION}/wof-sync-os-postcodes_${WOF_SYNC_OS_POSTCODES_VERSION}_linux_x86_64" -o wof-sync-os-postcodes
chmod +x wof-sync-os-postcodes

git clone https://github.com/whosonfirst/go-whosonfirst-pip-v2
cd go-whosonfirst-pip-v2
make tools
cp bin/wof-pip-server ..
cd ..

0 comments on commit d170108

Please sign in to comment.