forked from data-to-insight/ofsted-send-scrape-tool
-
Notifications
You must be signed in to change notification settings - Fork 0
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Loading branch information
1 parent
37c2d87
commit c4e0e38
Showing
4 changed files
with
16 additions
and
8 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -25,8 +25,8 @@ | |
<body> | ||
<h1>Ofsted CS JTAI Inpections Overview</h1> | ||
<p>Summarised outcomes of published JTAI inspection reports by Ofsted, refreshed weekly.<br/>An expanded version of the shown summary sheet, refreshed concurrently, is available to <a href="ofsted_childrens_services_jtai_overview.xlsx">download here</a> as an .xlsx file. <br/>Data summary is based on the original <i>JTAI Outcomes Summary</i> published periodically by the ADCS: <a href="https://www.adcs.org.uk/inspection-of-childrens-services/">https://www.adcs.org.uk/inspection-of-childrens-services/</a>. <a href="https://github.com/data-to-insight/ofsted-ilacs-scrape-tool/blob/main/README.md">Read the source ILACS tool/project background details and future work.</a>.</p> | ||
<p>Disclaimer: This summary is built from scraped data direct from https://reports.ofsted.gov.uk/ published PDF inspection report files. As a result of the nuances|variance within the inspection report content or pdf encoding, we're noting some problematic data extraction for a small number of LAs*.<br/> *Known extraction issues: JTAI report structure varies pre|post 2023. ADCS published inspection Themes unavailable via current scrape process. Publication date is based on CSS tag data and may not always reflect actual report publication. Where 1+ case studies are reported on, only 1 is pulled through.<br/><a href="mailto:[email protected]?subject=Ofsted-Scrape-Tool">Feedback</a> on specific problems|inaccuracies|suggestions welcomed.*</p> | ||
<p><b>Summary data last updated: 13 08 2024 17:53</b></p> | ||
<p>Disclaimer: This summary is built from scraped data direct from https://reports.ofsted.gov.uk/ published PDF inspection report files.<br/>As a result of the nuances|variance within the inspection report content or pdf encoding, we're noting problematic data extraction for a small number of LAs*.<br/> *Known extraction issues: <ul><li>JTAI report structure varies pre|post 2023(?), hence sparse|mixed summary columns until improved|agreed approach finalised.</li><li>ADCS published inspection Themes unavailable via current scrape process. This being worked on currently.</li><li>Publication date, isn't available within inspection reports and is therefore based on CSS tag data and may not always reflect actual report publication.</li><li>Where 1+ case studies are reported on (e.g. Peterborough City), only 1 summary is pulled through.</li></ul><a href="mailto:[email protected]?subject=Ofsted-Scrape-Tool">Feedback</a> highlighting problems|inaccuracies|suggestions welcomed.</p> | ||
<p><b>Summary data last updated: 14 08 2024 09:36</b></p> | ||
<p><b>LA inspections last updated: []</b></p> | ||
<div class="container"> | ||
<table border="1" class="dataframe"> | ||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -1035,10 +1035,15 @@ def save_to_html(data, column_order, local_link_column=None, web_link_column=Non | |
) | ||
|
||
disclaimer_text = ( | ||
'Disclaimer: This summary is built from scraped data direct from https://reports.ofsted.gov.uk/ published PDF inspection report files. ' | ||
'As a result of the nuances|variance within the inspection report content or pdf encoding, we\'re noting some problematic data extraction for a small number of LAs*.<br/> ' | ||
'*Known extraction issues: JTAI report structure varies pre|post 2023. ADCS published inspection Themes unavailable via current scrape process. Publication date is based on CSS tag data and may not always reflect actual report publication. Where 1+ case studies are reported on, only 1 is pulled through.<br/>' | ||
'<a href="mailto:[email protected]?subject=Ofsted-Scrape-Tool">Feedback</a> on specific problems|inaccuracies|suggestions welcomed.*' | ||
'Disclaimer: This summary is built from scraped data direct from https://reports.ofsted.gov.uk/ published PDF inspection report files.<br/>' | ||
'As a result of the nuances|variance within the inspection report content or pdf encoding, we\'re noting problematic data extraction for a small number of LAs*.<br/> ' | ||
'*Known extraction issues: <ul>' | ||
'<li>JTAI report structure varies pre|post 2023(?), hence sparse|mixed summary columns until improved|agreed approach finalised.</li>' | ||
'<li>ADCS published inspection Themes unavailable via current scrape process. This being worked on currently.</li>' | ||
'<li>Publication date, isn\'t available within inspection reports and is therefore based on CSS tag data and may not always reflect actual report publication.</li>' | ||
'<li>Where 1+ case studies are reported on (e.g. Peterborough City), only 1 summary is pulled through.</li>' | ||
'</ul>' | ||
'<a href="mailto:[email protected]?subject=Ofsted-Scrape-Tool">Feedback</a> highlighting problems|inaccuracies|suggestions welcomed.' | ||
) | ||
|
||
# # testing | ||
|
@@ -1069,8 +1074,9 @@ def save_to_html(data, column_order, local_link_column=None, web_link_column=Non | |
# # If a web link column is specified, convert that column's values to HTML hyperlinks | ||
# # Shortening the hyperlink text by taking the part after the last '/' | ||
if web_link_column: | ||
data[web_link_column] = data[web_link_column].apply(lambda x: f'<a href="{x}">ofsted.gov.uk/{x.rsplit("/", 1)[-1]}</a>') # publ_date | ||
# if web_link_column: | ||
data[web_link_column] = data[web_link_column].apply(lambda x: f'<a href="{x}">ofsted.gov.uk/{x.rsplit("/", 1)[-1]}</a>') | ||
|
||
# if web_link_column: # if the link is a bytes obj, this might be problematic | ||
# data[web_link_column] = data[web_link_column].apply(lambda x: f'<a href="{x}">ofsted.gov.uk/{x.rsplit("/", 1)[-1]}</a>' if isinstance(x, str) else x) # publ_date | ||
|
||
# Convert column names to title/upper case | ||
|
Binary file not shown.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters