Skip to content

Latest commit

 

History

History
213 lines (177 loc) · 12.5 KB

new_to_sql.md

File metadata and controls

213 lines (177 loc) · 12.5 KB

New to SQL?

What is SQL? SQLPad?

SQL, or Structured Query Language, is a way to interact with data that is stored in a special kind of storage called a relational database. This kind of data is stored in tables which have rows and columns.

SQLPad is a software package that allows you to send SQL queries to a connected relational database. Arcus offers the SQLPad tool for you to interact with data in your lab that is stored in your Arcus relational database.

""

Arcus-Specific SQL Training

Arcus On-Ramp

If you're already an Arcus user (you've signed our Terms of Use and completed CITI training), you can sign up for our Arcus On-Ramp webinars. In these webinars, you work in a real Arcus lab analyzing CHOP's electronic health record (EHR) to replicate an actual published study. Workshops focus either on exploring the data and defining a query for your study using SQL, or running the analysis in R/Python. No coding experience is required to attend. Registration closes one week before each workshop so we have time to add registered attendees as users in the webinar training lab. To sign up, please visit https://arcus.chop.edu/education/webinar-signup/. This link is only available for Arcus customers on the CHOP network.

Lab Training Videos

""

For an example of how to use SQL data in your Arcus lab, start with the training videos on your lab's landing page.

These are very introductory, but help you understand specifically how to work with your Arcus lab.

We strongly encourage you to watch all of the videos, in order, even the ones that don't refer to SQL or SQLPad (a tool we use in Arcus labs to interact with SQL) specifically. It's only about an hour of your time, and we think it will answer many of your questions and save time in the long run.

Additional Resources

Arcus training is a great place to get started with your SQL education, but you will probably want to continue your education on your own, growing in skills that are specific to your own research goals or career needs.

You have several options when it comes to growing in your SQL skills.

There are a number of university classes, online courses and live workshops that go in depth about how to use SQL. Simply search for courses at the university or MOOC (e.g. Coursera) you prefer to use.

If you prefer something a bit more "just in time", however, we suggest the SQL modules from the DART (Data and Analytics for Research Training) program.

DART includes dozens of data science modules that are each 1 hour or less in duration and with a narrow focus and clear learning objectives. They are asynchronous and you can take them at any time!

Arcus Education's DART modules are the result of a study funded by an NIH grant aimed at educating biomedical researchers. The active research phase of this program is complete, so we are no longer recruiting learners to be our subjects. However, if you'd like to receive updates about publications or applications of this research, please email us at [email protected].

Training Modules

To begin learning SQL, there are a couple of options with regard to the DART self-guided tutorial modules.

If you want a comprehensive curriculum of nearly twenty modules, you might enjoy our Pathway 3: Big data, big questions curriculum, which includes overview materials about reproducible research and data organization, introductory material in SQL, and some advanced topics you might need as a biomedical researcher, such as Regular Expressions. While you're there, check out the other suggested pathways, too!


Expand to see a sneak preview of Pathway 3: Big data, big questions!


Order Module Description Estimated Time
1 Reproducibility, Generalizability, and Reuse This module provides learners with an approachable introduction to the concepts and impact of research reproducibility, generalizability, and data reuse, and how technical approaches can help make these goals more attainable. 60 min
2 Research Data Management Basics Learn the basics about research data management. 40 min
3 Demystifying SQL SQL is a relational database solution that has been around for decades. Learn more about this technology at a high level, without having to write code. 40 min
4 Database Normalization Learn about the concept of normalization and why it's important for organizing complicated data in relational databases. 40 min
5 SQL Basics Structured Query Language, or SQL, is a relational database solution that has been around for decades. Learn how to do basic SQL queries on single tables, by using code, hands-on. 60 min
6 SQL, Intermediate Level Learn how to do intermediate SQL queries on single tables, by using code, hands-on. 60 min
7 SQL Joins Learn about SQL joins: what they accomplish, and how to write them. 60 min
8 Demystifying Geospatial Data This module is a brief introduction to geospatial (location) data. 15 min
9 Encoding Geospatial Data: Latitude and Longitude This is an introduction to latitude and longitude and the importance of geocoding - encoding geospatial data in the coordinate system. 15 min
10 The Elements of Maps This is a general overview of ways that geospatial data can be communicated visually using maps. 45 min
11 Demystifying Regular Expressions Learn about pattern matching using regular expressions, or regex. 30 min
12 Regular Expressions Basics Begin to use regular expressions, or regex, for simple pattern matching. 60 min
13 Regular Expressions: Groups Use regular expressions, or regex, for complex pattern matching involving capturing and non-capturing groups. 30 min
14 Regular Expressions: Flags, Anchors, and Boundaries Use flags, anchors, and boundaries in regular expressions, or regex, for complex pattern matching. 45 min
15 Regular Expressions: Lookaheads Use regular expressions, or regex, for complex pattern matching involving lookaheads. 30 min
16 Demystifying Large Language Models Learn about large language models (LLM) like ChatGPT. 60 min
17 Demystifying Machine Learning An approachable and practical introduction to machine learning for biomedical researchers. 60 min
18 Citizen Science This is an overview of citizen science for biomedical researchers. 45 min




If these pathways are close, but not quite right, you can also build your own pathway through these materials using our prototype curriculum development tool at https://learn.arcus.chop.edu.

If you're in a hurry and you want to just get a bit of specific SQL instruction, we recommend starting with these modules:

Also check out the Arcus Skill Series Beyond the Spreadsheet: Understanding SQL and Relational Databases (links to slides, and workshop recordings on the website).


Additionally, beyond the NIH grant, we have other articles and miscellany we suggest, whether those are resources we've created in Arcus, or materials from outside CHOP.

Compendia of Resources:

  • Our "SQL 101" Guide includes links to articles, webinars, and other materials on a variety of topics.