Skip to content

Commit

Permalink
Content and graphics for moving data from cloud storage
Browse files Browse the repository at this point in the history
  • Loading branch information
billspat committed Feb 23, 2024
1 parent 81ac61a commit c61e46f
Show file tree
Hide file tree
Showing 6 changed files with 92 additions and 8 deletions.
23 changes: 17 additions & 6 deletions docs/sessions/03_cloud_storage.md
Original file line number Diff line number Diff line change
Expand Up @@ -17,6 +17,19 @@ Central to using cloud for nearly all services is storing data. Cloud storage

Cloud storage was engineered to save millions of files for millions of users and will take some changes to your approach to understanding how it works.

## Topics


[Azure Cloud Storage for Researchers](../topics/azure_cloud_storage_for_researchers.html) (Web browser slide presentation)



A [Comparison of Databases and storage](../topics/storage_vs_databases.md) may help understand the role of a database vs simply keeping your data in files (for example Excel of CSV files).

[Storage Options for a VM, and transfer back to you](../topics/moving_data/Five_storage_options_for_cloud_vm_to_local_file-date_sharing.drawio.png)



## Activities

- Download and install the [Azure Cloud Storage Explorer](https://azure.microsoft.com/en-us/features/storage-explorer/) See the **"Download now"** button at the top of that page. You may review the content of the page
Expand All @@ -28,21 +41,19 @@ Cloud storage was engineered to save millions of files for millions of users and

- Exercise: [Create an SMB Azure file share and connect it to a Windows VM using the Azure portal]https://learn.microsoft.com/en-us/azure/storage/files/storage-files-quick-create-use-windows]

## Readings


- [Azure Cloud Storage for Researchers](../topics/azure_cloud_storage_for_researchers.html) (Slides)
## Readings


- Not a bad, high-level introduction : [Edureka Azure Storage Tutorial](https://www.edureka.co/blog/azure-storage-tutorial/) (there are several pop-ups and ads, but it's a good level of of information )
- [Storage as a Service](https://s3.us-east-2.amazonaws.com/a-book/storage.html) from "Cloud Computing for Science and Engineering"
- Azure Documentation: [Introduction to the core Azure Storage services
](https://docs.microsoft.com/en-us/azure/storage/common/storage-introduction)
- Azure Documentation: [Introduction to the core Azure Storage services](https://docs.microsoft.com/en-us/azure/storage/common/storage-introduction)
- [Table of Azure Storage Product Offerings](https://azure.microsoft.com/en-us/product-categories/storage/)
- Optional: this is long (It says 46 minutes but it will probably take less time) but a good basic introduction to Azure storage: <br>
[Azure Training: Explore Azure Storage services](https://learn.microsoft.com/en-us/training/modules/describe-azure-storage-services/) ( free training from Microsoft Learn)
- optional [Understanding block blobs, append blobs, and page blobs](https://docs.microsoft.com/en-us/rest/api/storageservices/understanding-block-blobs--append-blobs--and-page-blobs)

- [Introduction to Azure managed disks](https://learn.microsoft.com/en-us/azure/virtual-machines/managed-disks-overview) This has more technical background than necessary but could be very helpful.
- [Introduction to Azure managed disks](https://learn.microsoft.com/en-us/azure/virtual-machines/managed-disks-overview) This has more technical background than necessary but could be very helpful.


## Post-session discussion points
Expand Down
7 changes: 6 additions & 1 deletion docs/sessions/06A_data_servers.md
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,12 @@ title: 6A - Data Servers

Data servers (like Relational Databases) can be a powerful tool for even small research projects. When we say "Data Servers" or "Data Systems" we mean any server with data processing capabilities that you connect to with a client to send data, commands and receive results. The most widely used and classic example is the relational database management system (RDBMs) invented in the 1970s, but there are many other types. A central advantage of data servers is ability to handle many conncurrent connections. Connections can be from many users, a web application serving many uers, or many other concurrent processes. Like other systems (such as VMs, File storage servers, big data tools, etc), these data systems don't require cloud computing, but cloud companies offer database services such taht with a few clicks you can have a server that would take a week to provision and years to maintain. A data server could be a productive addition to your cloud architecturee, or the central aspect of your fellowship project.

I have use databases with many research projects that had significant data entry burden requiring many work-hours of students typing in data, or shared systems.
Databases are used in thosands of research projects that had significant data entry burden requiring many work-hours of students typing in data, or shared systems.

## Databases vs Storage

A [Comparison of Databases and storage](../topics/storage_vs_databases.md) may help understand the role of a database vs simply keeping your data in files (for example Excel of CSV files).


## Readings

Expand Down
14 changes: 13 additions & 1 deletion docs/topics/azure_cloud_storage_for_researchers.html
Original file line number Diff line number Diff line change
Expand Up @@ -31,7 +31,6 @@

<body>
<textarea id="source">
---
class: cloudtitle,center, middle

# Azure Cloud Storage For Researchers
Expand Down Expand Up @@ -440,6 +439,19 @@

# Other Resources

### Azure Storage Overview

Microsoft Documentation [Introduction to Azure Storage](https://learn.microsoft.com/en-us/azure/storage/common/storage-introduction)

*This talks about several kinds of storage, but we focus on Azure Files, Azure Blobs and Disks*

### Managed Disks

When one creates a VM, you must also create a disk to hold the operating system and that boots the VM.
That disk is a kind of storage. You can create a disk just for storage and attach it as a second
disk to your VM when you create it, re-using that 'data disk' over and over.

[Azure documentation for Managed disks](https://learn.microsoft.com/en-us/azure/virtual-machines/managed-disks-overview)

--

Expand Down
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
26 changes: 26 additions & 0 deletions docs/topics/moving_data/from_vm_to_you_storage_options.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,26 @@

---
title: Storage and data transfer options for VM
---

# Storage and data transfer options for VM

*From [Session #3, Cloud Storage](../../sessions/03_cloud_storage.md)*

![graphic](Five_storage_options_for_cloud_vm_to_local_file-date_sharing.drawio.png)

[Full size image](Five_storage_options_for_cloud_vm_to_local_file-date_sharing.drawio.png)

Links:


- [Azure Storage Explorer](https://azure.microsoft.com/en-us/products/storage/storage-explorer/) User application for moving data down from and up to Azure cloud storage including disks
- [the azcopy utility](https://learn.microsoft.com/en-us/azure/storage/common/storage-use-azcopy-v10), a command-line utility for moving data to/from your computer to the cloud, or from cloud-to-cloud. To access Azure Storage accounts
you must create and use a special URL that includes a Security Key (a "SAS" key in Azure terms).
- [Azure Data Studio](https://learn.microsoft.com/en-us/azure-data-studio/what-is-azure-data-studio), a cross-platform application for interacting with databases. Designed for Micrsoft's branded "SQLServer" but works with many open source databases. Requires an existing database in the cloud or elsewhere. There are many open source versions of this kind of database user-interface application: [List of database GUIs](https://www.eversql.com/top-7-mysql-gui-tools-for-windows/)
- [Quickstart: Azure Blob Storage client library for Python]( https://learn.microsoft.com/en-us/azure/storage/blobs/storage-quickstart-blobs-python?tabs=managed-identity%2Croles-azure-portal%2Csign-in-azure-cli)

Unfortunately the [R libraries that worked with various Azure services](https://github.com/Azure/AzureR) have not been worked on for several years and there is no guarantee they will work. [AzureStor](https://github.com/Azure/AzureStor)



30 changes: 30 additions & 0 deletions docs/topics/storage_vs_databases.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,30 @@
# Storage vs Databases

## Introduction

this does not discuss exactly what a database is, but instead compares with file storage. If you want to keep your files around, you need to store is somewhere. We don't usually lump databases in the same category as file storage, but in many ways it has the same function for research data: a way to keep data around between bouts of analysis.

## Definitions

**Storage** refers to the persistent retention of data in a computer system (local or cloud). Storage devices and media provide capacity to save and retrieve data as needed. Common examples of storage include hard drives, SSDs, optical discs, and flash drives. For these, the operating system of the computer handles the retrieval. Examples of cloud storage (see below) or a shared storage system, there is a server computer and operating system that manages the storage and allows you to connect remotely. If you want to work with data from a file in storage, traditionally that file must be loaded into memory and the your program manipulates the data inside`*`.

Databases are structured sets of data that are optimized for querying, analyzing, and manipulating data. Databases include software functionality like querying languages, access control, transactions, and analytics capabilities that storage devices lack. SQL and NoSQL are common database approaches. Databases typically require a remote server that has special software just for organizing, storing, analyzing and retreiving data for you. The computer program accessing the database server is a client, and does very little processing. If you want to work with data in a database server, you send the command describing the data you want, and the database server returns just that data. A database server can also calculate sorting , calculate transformations (math), and summaries (counts, averages, etc). You don't need to load the whole 'table' as you may have to for a CSV file.

`*` There are exceptions to this of course, new data storage libraries and file formats like Apache Arrow that allow you to load only part of file, OR in-memory database libraries that let you treat local files like databases.

## Comparison

| **Storage** | **Database** |
|---|---|
|Storage provides raw data capacity|structured for efficient querying and analysis|
| Storage systems see data as an opaque blob or block. The storage system has no visibility into the contents or structure of data. | **Databases** impose structure on data through schemas and modeling. This enables efficient querying and enforcing integrity constraints |
|Storage is accessed through raw read/write operations to locations like blocks or files by the operating system |Databases allow declarative access through query languages like SQL as well as APIs. the physical location is hidden from the user as the database server manages efficient disk access automatically |
|Durable very long-term persistence/retention for data files that may be infrequently accessed.| fast and frequent data accessed by record(row) (though database systems can live for decades) |
| A file can be opened by one process at a time | can read/write different rows to thousands of clients, but a process can 'lock' a row for writing to it to avoid contention |
| good for direct binary file access (images, videos, documents) | good for Multi-user applications and for managing transactions (all data operations must complete before committing) |

## Summary

Storage and databases both persist data but are optimized for different purposes. Storage provides durable capacity while databases structure data for efficient access. Storage suits long-term file retention while databases enable interactive applications. Both remain essential components of a complete data architecture.

**Stolen and adapted from the ad-laden website [DatabaseTown](https://databasetown.com/storage-vs-database-a-comparative-analysis/)

0 comments on commit c61e46f

Please sign in to comment.