Skip to content

Commit

Permalink
Add huq data file
Browse files Browse the repository at this point in the history
  • Loading branch information
gilesdring committed Dec 17, 2024
1 parent 51ceef2 commit 117ac5a
Show file tree
Hide file tree
Showing 7 changed files with 49 additions and 16 deletions.
7 changes: 3 additions & 4 deletions .github/workflows/deploy-dev.yml
Original file line number Diff line number Diff line change
Expand Up @@ -60,13 +60,12 @@ jobs:

- name: Get current data
run: |
# cp my-deploy-key deploy-key
umask 077
echo "${{ secrets.DEPLOY_KEY }}" > deploy-key
echo "${{ secrets.DEPLOY_KEY_HUQ }}" > deploy-key-huq
eval $(ssh-agent -s)
ssh-add deploy-key
# ssh-add -l
pipenv run dvc update -R data/
ssh-add deploy-key deploy-key-huq
pipenv run dvc update data/*.dvc
- name: Run pipelines
run: |
Expand Down
23 changes: 21 additions & 2 deletions CONTRIBUTING.md
Original file line number Diff line number Diff line change
Expand Up @@ -66,7 +66,7 @@ will not get the latest data from the remote repository, but will fetch the data
contained in the commit (md5 hash) referenced in the `.dvc` file.

```sh
dvc pull data/published.dvc data/metadata.dvc
dvc pull -R data/
```

If you wish to update to the latest available data, you can run `dvc update`.
Expand All @@ -75,7 +75,7 @@ consistent with the `rev` (i.e. branch name) specified in that file (or the
default branch if that's not specified).

```sh
dvc update data/published.dvc data/metadata.dvc
dvc update -R data/
```

These commands will need to be run in an environment where dvc is available. The
Expand Down Expand Up @@ -135,3 +135,22 @@ ensure that no leakage takes place.
to create a `.gitignore` file in the same folder as the page with the a
general exclusion of `/_data/*`. Exceptions can be added if needed. Once the
data is approved for publication, this file can be removed.


## DVC Remote access

Remote files are loaded from git repos as follows:

```sh
dvc import -o TARGET_PATH [email protected]:GIT_ORG/GIT_REPO REPO_PATH
```

> _e.g._ to import data from the `data` directory of the **bradford-2025**
> repo **my-repo** into the `data/my-data` directory, run the following command
>
> ```sh
> dvc import -o data/my-data [email protected]:bradford-2025/huq-data data
> ```
If the repo is private, you will need to have permission to access this data
via your GitHub account, or will need to provide a deploy key for the repository.
1 change: 1 addition & 0 deletions data/.gitignore
Original file line number Diff line number Diff line change
@@ -1,3 +1,4 @@
/processed
/metadata
/published
/huq
14 changes: 14 additions & 0 deletions data/huq.dvc
Original file line number Diff line number Diff line change
@@ -0,0 +1,14 @@
md5: fad6aa01733a27dbc7e9d908507a9099
frozen: true
deps:
- path: data
repo:
url: [email protected]:bradford-2025/huq-data
rev_lock: 2999e53ca3df0a3e996b76c0c887c9f0b09b85f5
rev: main
outs:
- md5: 0bcdbf567b2597907e08168965ba01b9.dir
size: 7713
nfiles: 2
hash: md5
path: huq
8 changes: 4 additions & 4 deletions data/metadata.dvc
Original file line number Diff line number Diff line change
@@ -1,14 +1,14 @@
md5: d749518ed517221f70e7d7e609288746
md5: f881c9ce0905bbfe7bf109d6d4be5a22
frozen: true
deps:
- path: data/metadata/processed/
repo:
url: [email protected]:bradford-2025/open-data-pipelines
rev: main
rev_lock: 1ff0179bfcf6c70c54b747750a1aa1d93748ec03
rev_lock: f0e342c6aa999ff4f44f94f97e9b2de2163b714c
outs:
- md5: bc4b2dd906cc820e657cfe8947e39773.dir
size: 9007
- md5: ccda40852fb7c10e98af94d25f9388b9.dir
size: 8995
nfiles: 7
hash: md5
path: metadata
8 changes: 4 additions & 4 deletions data/published.dvc
Original file line number Diff line number Diff line change
@@ -1,14 +1,14 @@
md5: 6e83378430c5bf96cff3cf5b1e47c992
md5: 2d1e4a0fb8e9a428cb92698074e949eb
frozen: true
deps:
- path: data/processed/
repo:
url: [email protected]:bradford-2025/open-data-pipelines
rev: main
rev_lock: 1ff0179bfcf6c70c54b747750a1aa1d93748ec03
rev_lock: f0e342c6aa999ff4f44f94f97e9b2de2163b714c
outs:
- md5: cc2a14445e52b2845322da0e59505b3f.dir
size: 169708
- md5: 14509e7ade4dffedd0e1368960a978f7.dir
size: 169787
nfiles: 7
hash: md5
path: published
4 changes: 2 additions & 2 deletions deno.json
Original file line number Diff line number Diff line change
Expand Up @@ -10,8 +10,8 @@
"dev": "LUME_DRAFTS=true deno task lume -s",
"build:dev": "LUME_DRAFTS=true deno task lume --location https://dev.open-innovations.org/bradford-2025-data/",
"deploy:dev": "rsync --info=STATS2 --recursive --delete --rsh=\"sshpass -e ssh -o StrictHostKeyChecking=no -l $SSH_USER\" --rsync-path \"sudo -u www-data rsync\" _site/ $SSH_HOST:$SSH_PATH",
"data:pull": "pipenv run dvc pull data/published.dvc data/metadata.dvc",
"data:update": "pipenv run dvc update data/published.dvc data/metadata.dvc",
"data:pull": "pipenv run dvc pull data/*.dvc",
"data:update": "pipenv run dvc update data/*.dvc",
"data:pipeline": "pipenv run dvc repro pipelines/dvc.yaml",
"build:full": "deno task data:update && deno task data:pipeline && deno task build"
},
Expand Down

0 comments on commit 117ac5a

Please sign in to comment.