forked from data-to-insight/ofsted-ilacs-scrape-tool
-
Notifications
You must be signed in to change notification settings - Fork 1
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Loading branch information
0 parents
commit 8e68500
Showing
52 changed files
with
2,782 additions
and
0 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,86 @@ | ||
# Workflow runs the scrape process to refresh/update daily | ||
|
||
# Run events | ||
on: | ||
# On push | ||
push: | ||
branches: ["main"] | ||
On pull | ||
pull_request: | ||
branches: ["main"] | ||
Add manual trigger from Actions tab | ||
workflow_dispatch: | ||
# Schedule run at 9 AM UTC every day | ||
schedule: | ||
- cron: '0 9 * * *' | ||
|
||
# Sets permissions of the GITHUB_TOKEN for deployment to GitHub Pages | ||
permissions: | ||
contents: write # changed from read to allow repo updates | ||
pages: write | ||
id-token: write | ||
|
||
# Define workflow job | ||
jobs: | ||
build: | ||
# Runs on the latest version of Ubuntu | ||
runs-on: ubuntu-latest | ||
steps: | ||
# Checks out a copy of repo | ||
- name: Checkout | ||
uses: actions/checkout@v3 | ||
|
||
# Set up Python | ||
- name: Set up Python | ||
uses: actions/setup-python@v2 | ||
with: | ||
python-version: '3.x' | ||
|
||
# Show Current Directory and List Files | ||
- name: Show Current Directory | ||
run: | | ||
echo "Current directory: $(pwd)" | ||
echo "Listing files:" | ||
ls -la | ||
# Install dependencies | ||
- name: Install dependencies | ||
run: | | ||
python -m pip install --upgrade pip | ||
pip install -r requirements.txt | ||
# Ensure Script is Executable | ||
- name: Ensure Script is Executable | ||
run: chmod +x ofsted_childrens_services_inspection_scrape.py | ||
|
||
# Run the scrape | ||
- name: Run Python script | ||
run: | | ||
echo "Running scrape script" | ||
python ofsted_childrens_services_inspection_scrape.py | ||
# Configure Git and Commit changes | ||
- name: Commit and Push changes | ||
# if: github.event_name == 'schedule' # Use on testing, to avoid inf loop for on push workflow event trigger | ||
run: | | ||
git config --local user.email "[email protected]" | ||
git config --local user.name "GitHub Action" | ||
git add index.html | ||
git commit -m "Update index.html via workflow" || echo "No changes to commit" | ||
git push | ||
# Deploy job | ||
deploy: | ||
# Run on the latest version of Ubuntu | ||
runs-on: ubuntu-latest | ||
# Build job must complete successfully | ||
needs: build | ||
steps: | ||
# Deploy to GitHub Pages | ||
- name: Deploy | ||
uses: peaceiris/actions-gh-pages@v3 | ||
with: | ||
github_token: ${{ secrets.GITHUB_TOKEN }} | ||
# Directory deployed to GitHub Pages | ||
publish_dir: ./ |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,51 @@ | ||
# Simple workflow for deploying static content to GitHub Pages | ||
# name: Deploy static content to Pages | ||
|
||
# # Run events | ||
# on: | ||
# # On push | ||
# push: | ||
# branches: ["main"] | ||
# # On pull | ||
# pull_request: | ||
# branches: ["main"] | ||
# Add manual trigger from Actions tab | ||
# workflow_dispatch: | ||
# # Schedule run at 9 AM UTC every day | ||
# schedule: | ||
# - cron: '0 9 * * *' | ||
# # Allows you to run this workflow manually from the Actions tab | ||
# workflow_dispatch: | ||
|
||
# # Sets permissions of the GITHUB_TOKEN to allow deployment to GitHub Pages | ||
# permissions: | ||
# contents: read | ||
# pages: write | ||
# id-token: write | ||
|
||
# # Allow only one concurrent deployment, skipping runs queued between the run in-progress and latest queued. | ||
# # However, do NOT cancel in-progress runs as we want to allow these production deployments to complete. | ||
# concurrency: | ||
# group: "pages" | ||
# cancel-in-progress: false | ||
|
||
# jobs: | ||
# # Single deploy job since we're just deploying | ||
# deploy: | ||
# environment: | ||
# name: github-pages | ||
# url: ${{ steps.deployment.outputs.page_url }} | ||
# runs-on: ubuntu-latest | ||
# steps: | ||
# - name: Checkout | ||
# uses: actions/checkout@v3 | ||
# - name: Setup Pages | ||
# uses: actions/configure-pages@v3 | ||
# - name: Upload artifact | ||
# uses: actions/upload-pages-artifact@v2 | ||
# with: | ||
# # Upload entire repository | ||
# path: '.' | ||
# - name: Deploy to GitHub Pages | ||
# id: deployment | ||
# uses: actions/deploy-pages@v2 |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,21 @@ | ||
MIT License | ||
|
||
Copyright (c) 2023 data-to-insight | ||
|
||
Permission is hereby granted, free of charge, to any person obtaining a copy | ||
of this software and associated documentation files (the "Software"), to deal | ||
in the Software without restriction, including without limitation the rights | ||
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell | ||
copies of the Software, and to permit persons to whom the Software is | ||
furnished to do so, subject to the following conditions: | ||
|
||
The above copyright notice and this permission notice shall be included in all | ||
copies or substantial portions of the Software. | ||
|
||
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR | ||
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, | ||
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE | ||
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER | ||
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, | ||
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE | ||
SOFTWARE. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,73 @@ | ||
# Ofsted-SEND-Scrape-Tool | ||
On demand Ofsted SEND results summary via inspection reports scrape from the Ofsted.gov pages | ||
Published: https://data-to-insight.github.io/ofsted-send-scrape-tool/ | ||
- | ||
### The automated daily update of this SEND summary page is not currently running; in the intrim we're running it manually on a weekly basis. | ||
|
||
## Brief overview | ||
This project is based on a proof-of-concept, 'can we do this' basis. As such it's supplied very much with the disclaimer of please check the vitals if you're embedding it into something more critical, and likewise pls feel free to feedback into the project with suggestions. The structure of the code and processes have much scope for improvement, but some of the initial emphasis was on maintaining a level of readability so that others might have an easier time of taking it further. That said, we needed to take some of the scrape/cleaning processes further than anticipated due to inconsistencies in the source site/data and this has ultimately impacted the intended 're-usable mvp' approach to codifying a solution for the original problem. | ||
|
||
The results structure and returned data is based almost entirely on the originating SEND Summary produced/refreshed periodically by the ADCS; the use of which has previously underpinned several D2I projects. We're aware of several similar collections of longer-term work on and surrounding the Ofsted results theme, and would be happy to hear from those who perhaps also have bespoke ideas for changes here that would assist their own work. | ||
|
||
The scrape process is completed by running a single Python script: ofsted_childrens_services_inspection_scrape.py | ||
|
||
|
||
## Export(s) | ||
There are currently three exports from the script. | ||
### Results HTML page | ||
Generated (as ./index.html) to display a refreshed subset of the SEND results summary. | ||
|
||
### Results Overview Summary | ||
The complete SEND overview spreadsheet, exported to the git project root ./ as an .xlsx file for ease and also accessible via a download link from the generated results page (index.html) | ||
|
||
### All CS inspections reports | ||
During the scrape process, because we scan all the related CS inspection pdf reports for each LA; these can be/are packaged up into tidy LA named folders (urn_LAname) within the git repo (./export_data/inspection_reports/). There is a lot of data here, but if you download the entire export_data folder after the script has run, with the overview summary sheet then the local_inspection_reports column active links will work and you can then easily access each LA's previous reports all in once place via the supplied hyperlink(s). *Note:* This is currently not an option when viewing the results on the web page/Git Pages. | ||
|
||
## Known Bugs | ||
Some LA's inspection reports have PDF encoding or inconsistent data in the published reports that is causing extraction issues & null data. | ||
We're working to address these, current known issues are: | ||
- tbc | ||
|
||
|
||
## Imports(s) | ||
There are currently two flat file(.csv) imports used. (/import_data/..) | ||
### LA Lookup (/import_data/la_lookup/) | ||
Allows us to add further LA related data including such as the historic LA codes still in use for some areas, but also enablers for further work, for example ONS region identifiers, and which CMS system LA's are using. | ||
### Geospatial (/import_data/geospatial/) | ||
This part of some ongoing work to access data we can use to enrich the Ofsted data with location based information, thus allowing us to visualise results on a map/choropleth. Some of the work towards this is completed, however because LA's geographical deliniations don't always map to ONS data, we're in the process of finding some work-arounds. The code and the reduced* GeoJSON data are there if anyone would like to fork the project and suggestion solutions. *GeoJSON data has been pre-processed to reduce the usually large file size and enable it within this repo/processing. | ||
|
||
|
||
## Future work | ||
|
||
- Some of the in-progress efforts are included as a point of discuss or stepping stone for others to develop within the download .xlsx file. For example a set of columns detailing simplistic inspection sentiment analysis based on the language used in the most recent report (ref cols: sentiment_score, inspectors_median_sentiment_score, sentiment_summary, main_inspection_topics). *Note that the inclusion of these columns does not dictate that the scores are accurate, these additions are a starting point for discussion|suggestions and development!!* | ||
|
||
- Geographical/Geospatial visualisations of results by region, la etc. are in progress. The basis for this is aready in place but some anomolies with how LA/counties boundary data is configured is an issue for some and thus the representation requires a bit more thought. | ||
|
||
- Improved automated workflow. We're currently still running the script manually until fixes can be applied to enable the Git Workflow(s) to run automatically/on a daily basis. We have the needed workflow scripts in place, but there is an ongoing issue in getting the py script to auto-run. Manual runs of the py script(+push/pull action) do correctly initiate the refresh of the html/GitPage. | ||
|
||
- Provide active link access to all previous reports via the web front end. This currently only available when all post-script run files/folders are downloaded(this a v.large download if all LA folders included). | ||
|
||
- Further development|bespoke work to improve potential tie-in with existing LA work that could use this tool or the resultant data. | ||
|
||
|
||
#### Contact via : datatoinsight.enquiries AT gmail.com | ||
|
||
|
||
## Script admin notes | ||
Simplified notes towards repo/script admin processes and enabling/instructions for non-admin running. | ||
### Script run intructions (User) | ||
If looking to obtain a full instant refresh of the SEND output, the ofsted_childrens_services_inspection_scrape.PY should be run. These instructions for running in the cloud/Github. | ||
- Create a new Codespace (on main) | ||
- Type run the following bash script at Terminal prompt to set up './setup.sh' | ||
- Run the script (can right click script file and select 'run in python....' | ||
- Download the now refreshed ofsted_childrens_services_inspection_scrape.XLSX (Right click, download) | ||
- Close codespace (Github will auto-remove unused spaces later) | ||
|
||
### Run notes (Admin) | ||
If you experience a permissions error running the setup bash file. | ||
|
||
/workspaces/ofsted-send-scrape-tool (main) $ ./setup.sh | ||
bash: ./setup.sh: Permission denied | ||
|
||
then type the following, and try again: | ||
chmod +x setup.sh |
Empty file.
Binary file added
BIN
+232 KB
export_data/inspection_reports/2532283_dorset/area send full inspection - 15 may 2024.pdf
Binary file not shown.
Binary file added
BIN
+292 KB
...tion_reports/2637539_north northamptonshire/area send full inspection - 25 march 2024.pdf
Binary file not shown.
Binary file added
BIN
+303 KB
...ection_reports/2637548_west northamptonshire/area send full inspection - 12 july 2024.pdf
Binary file not shown.
Binary file added
BIN
+252 KB
export_data/inspection_reports/80431_blackpool/area send full inspection - 25 july 2024.pdf
Binary file not shown.
Binary file added
BIN
+315 KB
...ta/inspection_reports/80438_brighton and hove/area send full inspection - 31 may 2023.pdf
Binary file not shown.
Binary file added
BIN
+285 KB
export_data/inspection_reports/80443_bury/area send full inspection - 07 may 2024.pdf
Binary file not shown.
Binary file added
BIN
+233 KB
export_data/inspection_reports/80451_wakefield/area send full inspection - 26 july 2024.pdf
Binary file not shown.
Binary file added
BIN
+242 KB
export_data/inspection_reports/80454_cornwall/area send full inspection - 05 may 2023.pdf
Binary file not shown.
Binary file added
BIN
+261 KB
export_data/inspection_reports/80469_gateshead/area send full inspection - 24 july 2023.pdf
Binary file not shown.
Binary file added
BIN
+475 KB
...ta/inspection_reports/80470_gloucestershire/area send full inspection - 01 march 2024.pdf
Binary file not shown.
Binary file added
BIN
+320 KB
export_data/inspection_reports/80471_halton/area send full inspection - 26 january 2024.pdf
Binary file not shown.
Binary file added
BIN
+263 KB
export_data/inspection_reports/80473_hartlepool/area send full inspection - 16 may 2023.pdf
Binary file not shown.
Binary file added
BIN
+368 KB
...a/inspection_reports/80475_hertfordshire/area send full inspection - 10 november 2023.pdf
Binary file not shown.
Binary file added
BIN
+267 KB
...pection_reports/80477_kingston upon hull/area send full inspection - 02 february 2024.pdf
Binary file not shown.
Binary file added
BIN
+337 KB
export_data/inspection_reports/80488_bexley/area send full inspection - 23 february 2024.pdf
Binary file not shown.
Binary file added
BIN
+293 KB
export_data/inspection_reports/80494_enfield/area send full inspection - 02 august 2023.pdf
Binary file not shown.
Binary file added
BIN
+208 KB
export_data/inspection_reports/80495_greenwich/area send full inspection - 11 july 2023.pdf
Binary file not shown.
Binary file added
BIN
+216 KB
export_data/inspection_reports/80498_haringey/area send full inspection - 03 april 2024.pdf
Binary file not shown.
Binary file added
BIN
+294 KB
export_data/inspection_reports/80501_hillingdon/area send full inspection - 12 july 2024.pdf
Binary file not shown.
Binary file added
BIN
+248 KB
...ction_reports/80513_richmond upon thames/area send full inspection - 04 december 2023.pdf
Binary file not shown.
Binary file added
BIN
+298 KB
export_data/inspection_reports/80522_medway/area send full inspection - 02 april 2024.pdf
Binary file not shown.
Binary file added
BIN
+231 KB
...a/inspection_reports/80523_middlesbrough/area send full inspection - 08 december 2023.pdf
Binary file not shown.
Binary file added
BIN
+303 KB
...t_data/inspection_reports/80524_milton keynes/area send full inspection - 20 may 2024.pdf
Binary file not shown.
Binary file added
BIN
+278 KB
...data/inspection_reports/80534_nottinghamshire/area send full inspection - 16 may 2023.pdf
Binary file not shown.
Binary file added
BIN
+213 KB
export_data/inspection_reports/80535_oldham/area send full inspection - 29 august 2023.pdf
Binary file not shown.
Binary file added
BIN
+301 KB
...ta/inspection_reports/80536_oxfordshire/area send full inspection - 15 september 2023.pdf
Binary file not shown.
Binary file added
BIN
+309 KB
export_data/inspection_reports/80538_plymouth/area send full inspection - 22 august 2023.pdf
Binary file not shown.
Binary file added
BIN
+206 KB
export_data/inspection_reports/80547_rutland/area send full inspection - 03 august 2023.pdf
Binary file not shown.
Binary file added
BIN
+183 KB
..._data/inspection_reports/80549_sandwell/area send full inspection - 12 september 2023.pdf
Binary file not shown.
Binary file added
BIN
+201 KB
...rt_data/inspection_reports/80558_southampton/area send full inspection - 16 july 2024.pdf
Binary file not shown.
Binary file added
BIN
+276 KB
...ata/inspection_reports/80559_southend-on-sea/area send full inspection - 09 june 2023.pdf
Binary file not shown.
Binary file added
BIN
+303 KB
...ata/inspection_reports/80564_stoke-on-trent/area send full inspection - 03 april 2024.pdf
Binary file not shown.
Binary file added
BIN
+293 KB
export_data/inspection_reports/80565_suffolk/area send full inspection - 30 january 2024.pdf
Binary file not shown.
Binary file added
BIN
+294 KB
export_data/inspection_reports/80567_surrey/area send full inspection - 24 november 2023.pdf
Binary file not shown.
Binary file added
BIN
+256 KB
...ta/inspection_reports/80570_telford & wrekin/area send full inspection - 03 july 2023.pdf
Binary file not shown.
Binary file added
BIN
+282 KB
...t_data/inspection_reports/80573_trafford/area send full inspection - 22 december 2023.pdf
Binary file not shown.
Binary file added
BIN
+333 KB
export_data/inspection_reports/80575_warrington/area send full inspection - 05 may 2023.pdf
Binary file not shown.
Binary file added
BIN
+268 KB
...ata/inspection_reports/80578_west sussex/area send full inspection - 29 february 2024.pdf
Binary file not shown.
Binary file added
BIN
+207 KB
...data/inspection_reports/80584_worcestershire/area send full inspection - 15 july 2024.pdf
Binary file not shown.
Oops, something went wrong.