Skip to content

Commit

Permalink
responding to review comments and fixing weird link format
Browse files Browse the repository at this point in the history
  • Loading branch information
jen-machin committed Sep 16, 2024
1 parent 1c7f2ee commit b842ed5
Show file tree
Hide file tree
Showing 2 changed files with 7 additions and 9 deletions.
6 changes: 2 additions & 4 deletions ADA/ada.qmd
Original file line number Diff line number Diff line change
Expand Up @@ -63,7 +63,7 @@ You can access support in the following places:

- The [ADA website](https://analytical-data-access.education.gov.uk/get-support/join-a-community) has a list of support resources

- You can find a list of current ADA champions can be found on the [ADA intranet page](https://educationgovuk.sharepoint.com/sites/lvewp00505), and they are able to provide advice and support if you run into any issues. If you're passionate about helping others with the migration and using innovative data tools, why not become an ADA champion? To get involved, please contact the [ADA team](mailto:[email protected]).
- You can find a list of current ADA champions can be found on the [ADA intranet page](https://educationgovuk.sharepoint.com/sites/lvewp00505), and they are able to provide advice and support if you run into any issues. If you're passionate about helping others with the migration and using innovative data tools, why not become an ADA champion? To get involved, please contact the [ADA team](mailto:[email protected])

---

Expand All @@ -81,8 +81,6 @@ Migration to Databricks offers a lot of potential benefits to analysts. These in

* If you regularly have to run the same code, access to workflows and code scheduling mean that you can set code to run at certain days or times to improve efficiency. Scheduled workflows will run even if your laptop is switched off

* Easier collaboration within workspaces

* Better transparency of work being undertaken in the department by making use of the ADA shared reports area.

---
Expand Down Expand Up @@ -270,7 +268,7 @@ You can write new SQL code in the Databricks SQL Editor, which is very similar t

These scripts do not need to be moved, but the syntax of the scripts or embedded code will need to be rewritten from T-SQL to Spark SQL.

You will also need to change the connection source to allow the code to run. Rather than pointing to data held in current SQL databases, you will need to point towards the Databricks Delta Lake instead.
You will also need to change the connection source to allow the code to run. Rather than pointing to data held in current SQL databases, you will need to point towards the Databricks unity catalog instead. To do this, please see our page on [setting up a connection between Databricks and RStudio using a SQL warehouse](/ADA/databricks_rstudio_sql_warehouse.html).


---
Expand Down
10 changes: 5 additions & 5 deletions ADA/databricks_fundamentals.qmd
Original file line number Diff line number Diff line change
Expand Up @@ -69,7 +69,7 @@ Due to the scalability of compute resources you can request a more powerful proc

The centralised nature of the data storage makes navigation of the Department's data estate much simpler with everything existing in a single environment. Combined with stronger data governance this makes discovery of supplementary or related data the Department holds much easier. In addition, it allows for datasets that are commonly used across the Department - such as ONS geography datasets - to be standardised and made available to all teams, ensuring consistency of data and it's formatting across the Departments publications.

The auditing, and automation facilities provide a lot of benefits when building data pipelines. These can be set up to run as required with little manual effort from analysts, and can build automated quality assurance into the pipeline so that analysts can be confident in the outputs. In addition, the auditing keeps a record of all inputs and outputs each time a process is run. Combining this with robust documentation stored in [Notebooks](databricks_notebooks.qmd) allows you debug issues retrospectively without having to repeatedly step through the process to see where unexpected issues have occurred.
The auditing, and automation facilities provide a lot of benefits when building data pipelines. These can be set up to run as required with little manual effort from analysts, and can build automated quality assurance into the pipeline so that analysts can be confident in the outputs. In addition, the auditing keeps a record of all inputs and outputs each time a process is run. Combining this with robust documentation stored in [Notebooks](/ADA/databricks_notebooks.html) allows you debug issues retrospectively without having to repeatedly step through the process to see where unexpected issues have occurred.

------------------------------------------------------------------------

Expand Down Expand Up @@ -173,7 +173,7 @@ For collaboration on code you should use a GitHub/DevOps repository which each u

All code should be managed through a versioned repository on GitHub or Azure DevOps. You can commit and push your changes to the central repository from the Databricks platform. Pull requests to merge your changes into the 'main' branch still take place on your Git provider.

To connect Databricks to a repository refer to the [Databricks and version control](git_databricks.qmd) article.
To connect Databricks to a repository refer to the [Databricks and version control](/ADA/git_databricks.html) article.

------------------------------------------------------------------------

Expand All @@ -197,8 +197,8 @@ In most cases a personal cluster will be the most versatile and easily accessibl

All compute options can be used both within the Databricks platform and be connected to through other applications. Instructions on how to connect R / RStudio to a SQL Warehouse, or a personal cluster can be found on the following pages:

- [Setup Databricks SQL Warehouse with RStudio](databricks_rstudio_sql_warehouse.qmd)
- [Setup Databricks Personal Compute cluster with RStudio](databricks_rstudio_personal_cluster.qmd)
- [Setup Databricks SQL Warehouse with RStudio](/ADA/databricks_rstudio_sql_warehouse.html)
- [Setup Databricks Personal Compute cluster with RStudio](/ADA/databricks_rstudio_personal_cluster.html)

---

Expand Down Expand Up @@ -247,4 +247,4 @@ An alternative to this is to specify packages/libraries to be installed on the c
Certain packages are installed by default on personal cluster and do not need to be installed manually. The specific packages installed are based on the Databricks Runtime (DBR) version your cluster is set up with. A comprehensive list of packages included in each DBR is available in the [Databricks documentation](https://learn.microsoft.com/en-us/azure/databricks/release-notes/runtime/).
:::

Once you have a compute resource you can begin using Databricks. You can do this either through connecting to Databricks through RStudio, or you can begin coding in the Databricks platforms using scripts, or [Notebooks](databricks_notebooks.qmd).
Once you have a compute resource you can begin using Databricks. You can do this either through connecting to Databricks through RStudio, or you can begin coding in the Databricks platforms using scripts, or [Notebooks](/ADA/databricks_notebooks.html).

0 comments on commit b842ed5

Please sign in to comment.