diff --git a/search/search_index.json b/search/search_index.json
index 91d9217..69aab59 100644
--- a/search/search_index.json
+++ b/search/search_index.json
@@ -1 +1 @@
-{"config":{"lang":["en"],"separator":"[\\s\\-]+","pipeline":["stopWordFilter"]},"docs":[{"location":"","title":"Syllabus","text":" from John Constable, Cloud Studies, 19th Century English painter
"},{"location":"#msu-cloud-computing-fellowship","title":"MSU Cloud Computing Fellowship","text":""},{"location":"#program-summary","title":"Program Summary","text":"The program runs Fall semester through Winder/Spring semester.
Fall semester is dedicated to learning how the cloud works. In-person sessions are approximately bi-weekly (see schedule below). A session includes preparatory readings and activities to orient you to the topic, followed by an in-person meeting on Friday to review the materials, seminar, provide a venue for discussion, and hands-on activities
Winter/Spring semester is for building a project using cloud computing culminating in a symposium where you present what you've learned and built. Winter/spring sessions are bi-weekly for presentations by the fellows on their project status, discussion on success and challenges, presentations by cloud practitioners, and for general help.
The culmination of the fellowship is a project resulting in a write-up and presentation during the spring symposium, typically held late-April or early-May.
"},{"location":"#textbook","title":"Textbook","text":"We will occasionally link to the following book:
\"Cloud Computing for Science and Engineering\", Ian Foster and Dennis B. Gannon, September 2017
MIT Press website Book Website : Cloud4SciEng.org The book website does provide open access to individual chapters.
"},{"location":"#meeting-location","title":"Meeting location","text":"MSU STEM Teaching and Learning Facility 642 Red Cedar Rd Michigan State University. East Lansing, MI 48824
Fall 2023: room 3201 STEM room 3201 is on the North West corner on the 3rd floor, across from the North elevator Winter 2024: room 1201 STEM We plan to hold all sessions in-person.
"},{"location":"#fall-2023-schedule","title":"Fall 2023 Schedule","text":"Each approximately 2-week session consists of - preparatory activities and materials (topics, links, tutorials) prior to meeting - an in-person session for review, activities, and discussion on the friday of week 1 - follow up for week 2
Introduction Requirements: September 4-8 Complete items in the Welcome email sent by Dr. Parvizi post to teams to say hello Meeting September 8, 3pm STEM 3201: introduction to the cloud fellowship intro to Cloud/Computing and Azure Assignments:
How to Cloud
Meeting September 22, 3pm
Azure Organization Creating and Using Virtual Machines Cloud Storage
Meeting October 6, 3pm Assignment/Exercise: analyzing weather data in the cloud
Databases and Data Analytics Systems on the Cloud for research
Meeting October 20, 3pm Exercises: Using SQL database for research data Big Data Systems and the cloud
Meeting November 3, 3pm: Exercise: Using R and Python on a databricks cluster Serverless Cloud Computing
Meeting November 17, 3pm Review Project Requirements and Specification Assignment: email 1-2 sentences describing a project you may undertake by December 6 Azure AI Services
Meeting December 1, 3pm: Discussion: Fellowship Projects Demonstration of AI Services Exercise Hands-on reating responder using Python API Assignment: project proposal due January 8, 2024 "},{"location":"#winterspring-2023-schedule","title":"Winter/Spring 2023 Schedule","text":"The second half is dedicated for fellows to complete a cloud computing project based on research interests culminating in a presentation at a symposium in late April
Fellows will attend bi-weekly meetings where groups of fellows will present the goals and stats of their cloud computing projects for feedback and discussion.
All meetings are in the MSU STEM Building, room 1201, alternate Fridays 3pm to 4:30pm
Instructors are available by appointments, and typically during the alternate fridays to answer any questions you have about cloud, projects, or applying cloud computing technology to your research
Turn in Project Proposal Monday, January 8th Post Written Project Proposal to MS Teams folder prior to 5:00 pm Additionally survey of fellows to determine symposium dates will be distributed
Schedule meeting with Instructors to review proposals. This is on-going during January to ensure we have time to meet with all fellows one-on-one.
Cloud Computing Seminar TBD January 12th
Project Proposal Presentations Fellows will present their proposals to the fellowship, up to 6 per session, followed by questions and feedback from colleagues
January 26, February 9th, February 23
Project status presentations Fellows will present the status of their projects, describing challenges and successes, and receive questions, feedback, support and help from the fellowship ** March 8, March 22, April 5**
Project Final Reports April 12 A writeup of the the results and lessons from applying cloud computing technology
Symposium Preparation
The data and time of the symposium will be determined January 24 Fellows must turn in Symposium Talk Title & Abstract 3d prior to symposium "},{"location":"#msu-cloud-computing-fellowship-symposium","title":"MSU Cloud Computing Fellowship Symposium","text":"Fellows will present the outcomes, successes, challenges and lessons learned at a symposium held on MSU campus late April, 2024. The date and time determined in January 2024 with input from the fellows. Fellows are strongly encourage to invite their advisors, mentors and colleagues.
"},{"location":"#communications","title":"Communications","text":"Fellows are encouraged to contact us with questions or if they are ever stuck on an activity we've assigned. In addition to email, we are utilizing Microsoft Teams at MSU (Fellows receive a link in the welcome email). Please feel free to reach on out the MS Teams channel sent to participants at the beginning of the program. Mentioning one of us e.g. @billspat or @parvizm will help get our attention. Additionally you may email us at any time.
The goal of the fellowship is to foster discussion. We encourage you to add your successes or challenges to any discussion or question Teams.
If you need interactive, on-going help it may be better to schedule a help session with a fellowship coordinator; and we are happy to meet individually for additional support. This may be especially effective when fellows are developing their projects.
We also save time during our synchronous meetings for group discussions, so please bring any concerns, difficulties, or successes to our sessions!
If you are not a participant but have questions about the program, see the Contact page for how to get in touch with us.
This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License
"},{"location":"about/","title":"About The MSU Cloud Computing Fellowship","text":"The MSU Cloud Computing Fellowship is a cross-disciplinary program produced by MSU\u2019s Institute of Cyber-Enabled Research (ICER) and MSU IT Services for invited MSU doctoral students and postdoctoral researchers. As a part of this program, fellows will participate in a series of workshops during the fall semester to:
Determine the aspects of your research that can be accomplished with cloud computing; Incorporate cloud-based systems into your research application or workflow; and Understand the strengths and limitations of commercial cloud computing with the goal of improving research yield and minimizing cost, and to develop a workflow that utilizes that knowledge. "},{"location":"about/#background","title":"Background","text":"MSU doctoral students and postdoctoral researchers are invited to apply in the summer and approximately 18 are selected each year. The program started in 2019. If you are an MSU graduate student or post-doc and interested in participating next year, please check back in the Summer of 2022 for announcements on the invitation to participate, or request to join the MSU ICER mailing list
"},{"location":"about/#citing-the-msu-icer-cloud-computing-fellowship-in-research-publications","title":"Citing the MSU ICER Cloud Computing Fellowship in Research Publications","text":"We encourage cloud fellows to acknowledge the fellowship in publications arising from computational work performed during your fellowship project. Please let us know that you have referenced the fellowship, and we will link to your publication on the ICER publication site, which will further increase the visibility of your work. A sample statement can be:
\"This work was supported in part through Michigan State University\u2019s Institute for Cyber-Enabled Research Cloud Computing Fellowship, with computational resources and services provided by Information Technology Services and the Office of Research and Innovation at Michigan State University.\u201d
"},{"location":"about/#cloud-computing-fellowship-organizers","title":"Cloud Computing Fellowship Organizers","text":"Dr. Brian O'Shea Professor and Director, MSU ICER
Role: Program Lead, ICER
Dr. Brian O'Shea is a computational and theoretical astrophysicist studying cosmological structure formation, including galaxy formation and the behavior of the hot, diffuse plasma in the intergalactic medium and within galaxy clusters. He is also a co-author of the Enzo AMR code, an expert in high performance computing, and an advocate for open-source computing and open-source science. He received his B.S. in Engineering Physics at the University of Illinois in Urbana-Champaign (UIUC) in 2000, and his PhD in physics from UIUC in 2005 (with 2002-2005 being spent as a graduate student in residence at the Laboratory for Computational Astrophysics at UC San Diego and in the Theoretical Astrophysics Group at Los Alamos National Laboratory). Following that, he was a Director's Postdoctoral Fellow at Los Alamos National Laboratory, with a joint appointment between the Theoretical Astrophysics Group and the Applied Physics Division. Since 2008, he has been a member of the faculty at Michigan State University, with a joint appointment between the Department of Computational Mathematics, Science and Engineering (2015-present), the Department of Physics and Astronomy (2008-present), and the National Superconducting Cyclotron Laboratory (2014-present). From 2008-2015, Dr. O'Shea was a member of Lyman Briggs College. He has authored or co-authored over 75 peer-reviewed journal articles in astrophysics, computer science, and education research journals, and has received a variety of awards for his teaching and public outreach efforts. In 2016, he became a Fellow of the American Physical Society, and in 2019 he became the director of MSU's Institute for Cyber-Enabled Research.
Patrick Bills Research Software Engineer, ICER Role: Co-Instructor
Pat Bills research background is in data systems for ecology (MS Entomology, MSU). He has experience in database design, R, Python, and web application programming. Pat has worked in research IT for over 25 yrs for departments and labs across MSU, including for MSU ICER as a high performance computing research consultant and trainer, for MSU Enterprise services as the technical lead of the data science team, and currently as a research software engineer again for ICER.
Like many, he has built and worked with on-campus linux systems for many years including the MSU HPC. Pat started his cloud journey in 2017 during a workshop at the HPC conference where he saw Ian Foster (our textbook author) present his vision of research on the cloud. Since then he has used cloud services from Google, Amazon, and Azure
Dr. Mahmoud Parvizi Research Consultant, MSU ICER Role: Co-Instructor
Mahmoud earned his PhD in physics from Vanderbilt University with research in high-energy theory in the context of early universe cosmology as well as computational astrophysics. In addition, Mahmoud earned an MBA with a concentration in finance from the University of Michigan - Flint. Mahmoud was formerly a postdoctoral research associate in the Department of Physics and Astronomy at Michigan State University with a focus on machine learning applications of cloud-computing workflows and currently a research consultant for the MSU Institute for Cyber-Enabled Research (ICER). He participated as a cloud fellow in 2019 and co-instructor of the Cloud Computing Fellowship in 2020.
Mahmoud\u2019s diverse research interests include mathematical and theoretical physics, data-intensive astrophysics, machine learning for precision health, and cloud-computing platforms for academic research. His expertise includes 1) quantum field theory in curved/non-stationary spacetimes; 2) finite temperature quantum field theory and open quantum systems; 3) automated and end-to-end intelligent data pipelines for signal processing using compressed sensing and applied harmonic analysis; 4) machine learning and cloud-computing applications for precision health.
Sponsored by ICER, the MSU Office of Research and Innovation (ORI), and MSU IT Services Research Cyberinfrastructure (RCI)
"},{"location":"about/#previous-cloud-fellows","title":"Previous Cloud Fellows","text":"2019-2020
MSU Cloud Computing Fellows Summary of the first cohort of MSU Cloud Computing Fellows 2020-2021
Introducing the 2020 Cloud Fellows 20-21 Cloud Computing Fellowship Culminates in Impressive Symposium 2021-2022
Introducing the 2021 MSU Cloud Computing Fellows
2022-2023
Introducing the 2022 MSU Cloud Computing Fellows
4th Annual Cloud Computing Fellows Symposium
"},{"location":"cloud_glossary/","title":"Glossary of Cloud Terms","text":""},{"location":"cloud_glossary/#why","title":"Why?","text":"Researchers using the cloud must know a little about a lot of information technology to get computational work done in their domain specialty. Most cloud glossaries are for systems administrators, not the rest of us. This glossary is much more brief than Wikipedia and hopefully also provides the context a researcher needs to find what you need to use cloud services in your work. Do you have an item to add? Please contact us!
"},{"location":"cloud_glossary/#other-glossaries","title":"Other Glossaries","text":"https://www.cloudbank.org/cloud-terms
"},{"location":"cloud_glossary/#the-glossary","title":"The Glossary","text":""},{"location":"cloud_glossary/#arm-cpu","title":"Arm CPU","text":"CPU from \"Advanced RISC Machines, ltd. While historically most computers used Intel CPUs, ARM provides an alternative CPU that is becoming more popular and present as an option in HPC and Cloud Virtual Machine options. The vast majority of software written for Intel computers is compatible with ARM. Some computational work is sensitive to CPU choice, and CPU choice can affect cost and speed of excecution, so it may be important to understand the implications of this choice of CPU.
"},{"location":"cloud_glossary/#arm-template","title":"ARM Template","text":"A specification file listing all of the cloud resources and configuration settings tha that the Azure Resource Manager can use to create resources for you when you submit it a certain way. Templates are a great shortcut and automation feature but difficult to edit. For details see Azure Documentation: What are ARM templates?
"},{"location":"cloud_glossary/#azure-resource-manager-arm","title":"Azure Resource Manager (ARM)","text":"see Resource Manager
=#### Blob Storage Azure calls there object cloud storage \"Blobs\". It is similar to Amazon Web Service 'S3' and Google cloud storage buckets. Azure Documentation: Introduction to Azure Blob storage While it's possible to 'mount' blob storage to linux VMs using 'blob fuse' or similar packages, it can not work as you may expect and so in practice Azure Files are a better solution for that. See File Storage
"},{"location":"cloud_glossary/#client-server","title":"Client-Server","text":"Client/Server model of computing is something we use everyday but perhaps dont' use this term. See https://techterms.com/definition/client-server_model You are used to using maybe a dozen clients everyday (phone apps, web browser, ssh to connect to a remote linux, Remote Desktop client to connect to remote desktop server, etc). Cloud computing provides all the infrastructure needed to create servers quickly and easily.
"},{"location":"cloud_glossary/#cloud-shell","title":"Cloud Shell","text":"Cloud computing providers usually have a service where you can run command line (CLI) or terminal commands in a web browser 'shell' This is helpful as the libraries and utilities are pre-installed. See https://docs.microsoft.com/en-us/azure/cloud-shell/overview and Azure Interfaces introduction.
"},{"location":"cloud_glossary/#containers","title":"Containers","text":"Or Docker Containers (not all containers need to be Docker the vast majority of container system use Docker). For R users, see https://colinfay.me/docker-r-reproducibility/ For Python users, there is https://www.netguru.com/blog/python-docker-tutorial although you could read either.
Linux Containers is a term for a collection of methods and technologies that allows a multiple isolated systems to be run on one Linux computer. This is differnet from virtual machines in that a VM host provides abstract or virtualized hardware so each VM requires it's own portion of memory and CPU cores whereas containers share the main part of Linux (the kernel), memory and CPU more dynamically. The primary comercial company for containers is \"Docker\" so Docker is sometimes used synonymously with 'container' but it is just one form.
In addition to being more efficient than VMs, most container systems have a system and scripting language for building containers. The means onecan provision an entire system from code. Containers are widely use to package and distribute complex research software systems for example Bioinformatics workflow system \"Cromwell.\" This way reseearches can download and use a pre-installed system without the trouble of getting all of the pre-requistes (dependencies) installed on their machine.
"},{"location":"cloud_glossary/#cpu","title":"CPU","text":"Central Processing Unit, the main 'chip' of a computer, and a core component when specifying a Virtual Machine 'size'
"},{"location":"cloud_glossary/#devops","title":"DevOps","text":"This has many definitions but for researchers the shortcut is using code to make IT infrastructure. Helping developers (like you) do Ops (like sysadmins) with code. see IaC.
"},{"location":"cloud_glossary/#docker","title":"Docker","text":"Docker is the most prevalent form of \"Containers\", e.g Docker is to containers as google is to search. See containers above for details. Note that Docker is many things as once: a method and format for Linux containers, a program for working with container ( e.g. docker build...
), a Company, and that's company's hub or repository for storing and access free containers (or your own). Cloud companies also have \"hubs\" or repositories for storing your own Docker containers.
"},{"location":"cloud_glossary/#file-storage-azure","title":"File Storage (Azure)","text":"Also called \"Azure Files.\" Azure cloud storage that is more traditional file sharing, and that can be connected (mounted) to computers and other services using the SMB protocal, making it similar experience to departmental shared fileservers. See https://azure.microsoft.com/en-us/services/storage/files/ and compare with Blob Storage
"},{"location":"cloud_glossary/#firewall","title":"Firewall","text":"A common concept in networking, firewall software on a computer's networking components limits which kind of traffic can come in or out, and restricts which computer internet addresses can connect. Best practices suggest closing all connections via the firewall, only opening those connections for services you need, and only to those users (e.g. your own computer) you need to. Azure additionally has an option to \"allow connections from Azure networks\" so that you can freely connect from the portal, 'cloud shell', or connect from on azure service to another. The implication is that you trust all Azure services.
"},{"location":"cloud_glossary/#gpu","title":"GPU","text":"From Wikipedia: https://en.wikipedia.org/wiki/Graphics_processing_unit GPUs can be very helpful for some code written to use them, especially many machine learning libraries, and Virtual Machines may be provisioned with GPUs.
"},{"location":"cloud_glossary/#infrastructure-as-code-iac","title":"Infrastructure as Code (IaC)","text":"In stead of using a GUI, or manual steps to create cloud computing, cloud resources may be created using scripts that interact with the cloud provider's api, and additional scripts can configure individual resources (such as to install software on a VM or configure a database). Doing this kind of \"provisioning\" with scripts makes it reproducible and debuggable which is at the heart of the Workflow or DevOps mentality.
"},{"location":"cloud_glossary/#ip-address","title":"IP Address","text":"a unique string of characters that identifies each computer using the Internet Protocol to communicate over a network. Your computer will have a different IP address depending on where you are located (home, work, field). In addition, a home wifi router will assign a 'local' ip address for inside your home, but your 'public' internet IP address will be different. To find your own IP address, simply google \"what is my ip.\" All Azure services (VMs, data systems, etc) are assigned IP addresses via networking. see https://docs.microsoft.com/en-us/azure/virtual-network/public-ip-addresses
"},{"location":"cloud_glossary/#object-storage","title":"Object Storage","text":"From NetApp \"What is object storage?: \"...also known as object-based storage, is a strategy that manages and manipulates data storage as distinct units, called objects. These objects are kept in a single storehouse and are not ingrained in files inside other folders. Instead, object storage combines the pieces of data that make up a file, adds all its relevant metadata to that file, and attaches a custom identifier.\" Blob storage is object storage. Objects (e.g. files) are retrieved from a large system via their identifier, not their name. Amazon S3 and Google Cloud storage are also object stores.
"},{"location":"cloud_glossary/#on-prem","title":"On-prem","text":"\"On Premise\" refers to technology (computers, disks, networking, etc) that are on your institutions computer centers or in your own lab. Note that for some researchers, \"on-prem\" can still mean remove (e.g. our HPC is only accessible remotely, so it may not be obvious that it's on premise to users).
"},{"location":"cloud_glossary/#resource","title":"Resource","text":"For AWS and Azure, a resource is an entity that you can work with. The means something you can created, edit or delete via their cloud interface. Could be a computer (virtual machine), a whole cluster (azure batch pool), or some tiny network setting (IP address). Resoures almost always cost money. Resources are listed in your standard dashboard.
"},{"location":"cloud_glossary/#resource-group","title":"Resource Group","text":"Organizational scheme unique to Azure. Nearly all resources must be part of a group and the resource group must be selected (or created ) when creating other resources. Resource groups could be used for specific projects, for 'personal' resources used for multiple projects (or for azure things like cloud shell).
"},{"location":"cloud_glossary/#resource-manager","title":"Resource Manager","text":"Azure calls the system they use to interface between you and cloud resources the \"Azure Resource Manager\" or ARM. There used to be a different way to interact with Azure resources, hence this has a specific name and is referred to in Microsoft documentation.
"},{"location":"cloud_glossary/#serverless","title":"Serverless","text":"This buzz-word applies to many different cloud services, primarily those that the cloud company manages for you, usually referring to cloud functions (AWS Lamba) and sometimes others in the \"Platform As A Service\" service model. The origin is that, if you run virtual machines with operating systems and software install, your are maintaining servers to support that software. If the cloud service does not require you to provision and maintain a server, it is often marketed as \"serverless\" (e.g. recent marketing of Azure Files as \"Serverless file shares\" where on-premise File Sharing requires staff to manage and maintain Windows File Servers.
"},{"location":"cloud_glossary/#service-models","title":"Service Models","text":"This is related to the \"... as a service\" (..aaS) phrases defined in the NIST document which included \"Infrastructure\", \"Platform\" and \"Softare\" as a service (IaaS, PaaS and SaaS). It's a conceptual organization of cloud services based on the stack model of computating with the infrastructure (network, hardware, CPU, etc) at the bottom and Software on the top. See The NIST Definition of Cloud Computing
"},{"location":"cloud_glossary/#service-level-agreement-sla","title":"Service Level Agreement (SLA)","text":"Level of service you expect from a vendor, laying out the metrics by which service is measured, as well as remedies or penalties should agreed-on service levels not be achieved. In Cloud this is often spells out 'uptime,' which is percent of time the system is not down, e.g. 99.99%, and guarantees against data loss and availability. For most research, uptime is not important as we are our own customer and can tolerate some downtime.
"},{"location":"cloud_glossary/#services","title":"Services","text":"Cloud \"services\" are often bundles of resources pulled together for coordinate function. Cloud companies offer hundreds of often closely overlapping services.
"},{"location":"cloud_glossary/#tags","title":"Tags","text":"AWS and Azure allow you add meta data to resource in the form of tags (e.g. hashtags, etc) which are keys and values. When you create a resource you can add a tag indicating the project it is for e.g. \"project\" = \"dna-methylation\" To add more detail if your DNA methylation has multiple aspects or experiments, add more tags like \"experiment\" = \"Fall 2021\"
For workgroups it's stronlgy suggested you add a \"created_by\" = your netid because it's often difficult in Azure to determine who created a resource if it needs to be turned off or deleted.
Use tags to organize your Azure resources and management hierarchy
"},{"location":"cloud_glossary/#tensor-processing-unit-tpu","title":"Tensor Processing Unit (TPU)","text":"Google Tensor Processing Unit is specialized computer chip similar to GPUs, used by deep learning libraries such as TensorFlow ( which leads to the question of \"what is a tensor\" and that depends on who you ask but similar to matrix.
"},{"location":"cloud_glossary/#virtual-machine","title":"Virtual Machine","text":"(aka VM) Creating a simulated computer hardware using software, to be able run a guest operating system inside a host system, such that the guest thinks it's running on an actual computer.
"},{"location":"contact/","title":"Contacting Us","text":"If you are a Cloud Computing Fellowship participant this year (or past participant!), please contact the instructors Pat Bills or Mahmoud Parvizi with any issues or questions related to the material or activities.
The session meetings are designed to have plenty of time for questions, troubleshooting and discussion. We will also schedule office hours prior to meeting times to help with pre-meeting activities.
If you have general questions about the MSU Cloud Computing Fellowship, please contact Brian O'Shea
If you will be an MSU graduate student or post-doc in the next Fall, and are interested in participating, please check back in the Summer for announcements for invitation to participate. The request for applications is announced on the MSU ICER mailing list and several other mailing lists around campus. We encourage anyone with an active research program that could benefit from cloud computing to apply
If you are an MSU Researcher interested in using cloud for your research, please contact IT Services or MSU ICER via our ticketing systems and describe your needs.
"},{"location":"projects/","title":"Projects","text":""},{"location":"projects/#cloud-computing-fellowship-projects-2022-2023","title":"Cloud Computing Fellowship Projects 2022-2023","text":"The primary activity of the Cloud Computing Fellowship is to support the fellows to create and present a cloud-computing-based project working with research data. During Fall semseter the fellowship provides materials and help to learn core cloud concepts and activies, and Winter/Spring semester is devoted to project development.
"},{"location":"projects/#time-line-and-due-dates","title":"Time-line and Due dates","text":"Fellows deliver a proposal for their projects in early January 2024, and present that proposal to their colleagues in the fellowship. See the schedule for due dates. In Winter 2024 more detail will be provided on this site.
"},{"location":"projects/#questions-answers-and-other-notes","title":"Questions, Answers and other Notes","text":"Q. Do I have to use my own data for my project or can I use data from the web or other public data?
A. you can bring any data that you may use for your research, or that demonstrates cloud processes you may use in your research.
Q. Could I work on a problem outside of my research for my project
A. Yes. We encourage fellows to consider some small aspect of their own research to apply to their projects, but not all research can be readily adapted for cloud computing to contribute, especially with very limited time and budget. If the project is related, even tangentially, to your current research project, and you feel your chosen project will advance your career or knowledge of cloud for later application, then by all means please pursue and present what you've learned.
Q. Do I have to use programming in my project?
A. Most of the examples provided in the fellowship talk about processing data with scripts such as R or Python and many researchers are using these for data analysis, but it's not required for a successful project. You could install a program on a powerful virtual machine and show how to use that software along with cloud storage to tackle a large data set (for example). Secondly there are many forms of cloud computing that are not traditional such as data systems which may use a GUI or a language like SQL.
One important aspect of a successful project is \"workflow thinking\" or how could you design your process so that you could do it 100 times or with some form of automation. That often requires programming but there are cloud systems that don't require programming (e.g. Azure Data Factory). Accumulating and organizing data is a huge part of successful research and using cloud tools to facilitate that and documenting the process, advantages and costs would be a successful project.
Q. Do I have to use Virtual Machine as part of my project?
No you don't, and in fact we encourage you to look for other services in the cloud to work with your data or your research processes.
Q. Do I have to use services that we've covered in the sessions?
A. Cloud companies provide many amazing services, and you are not limited to what we've talked about in the sessions. In addition we don't require you to use \"computation\" based services alone . If you are interested in using some other service, please contact us and we may find useful resources or connect you with a colleague who has used the service in mind.
Q. Are there constraints on the things I want do with my project? Can I do whatever I want?
A. Our goal is to facilitate your education and advancing your research program as it relates to cloud computing, and that is a very broad goal. If you use the fellowship to develop only a small system to show what's possible or not possible, even on public data, that uses cloud computing, that is an acceptable project.
Q. I want to make a web site or application for my project, can I use a VM? how do I do that?
A. This is a common request and the cloud was invented in part to run web applications. However web application design is a huge subject and the programming involved is almost as complex as any programming or data work you've done for your research. We tend to discourage projects focused on web applications because of the work involved to both 1) create the infrastructure for a website (web server, storage, databases, possibly docker containers, etc) and 2) the web application itself (Python/PHP other language, HTML, Javascript, Style Sheets, etc).
Azure has services for hosting websites but don't attempt this for your project unless you have previous experience making websites or web applications, or if you are up for the big challenge of learning webdev along with cloud computing because the research you are showing off is mostly complete. Secondly web services must be on-line 24/7 and the cost may accumulate quickly.
Finally cybsecurity is a major issues for websites which present an open door to anyone on the Internet. keeping your site secure is a major challenge so during development please turn it off when you are not using it, and consider that web applications are hacked routinely.
However if you are ready to devote the time and this is a goal for your and your advisor please come speak with us as we have experience creating research web applications and we will support you.
"},{"location":"exercises/azure_portal_walkthrough/","title":"Exercise: Azure Portal Walk-through and Storage account creation","text":"from MSU Cloud Computing Fellowship Session 1
"},{"location":"exercises/azure_portal_walkthrough/#about","title":"About","text":"This is an exercise and introduction to the web interface to manage Microsoft Azure cloud services. Prior to doing this exercise, please read Azure Organization For more background on how azure is structured.
For definition of terms used in this walkthrough , refer to our Cloud Glossary including \"resource\", \"azure resource manager\" and \"resource group\" or our list of cloud references for introduction to cloud computing.
For this activity we'll be using the web interface which Azure calls the \"Portal\" but that is only one of several ways to interface with Azure that we will learn about. Many of the activities you can accomplish in the portal you can accomplish with the other (command line or code) interfaces.
Azure's own overview of the Portal is here: https://docs.microsoft.com/en-us/azure/azure-portal/azure-portal-overview Please refer to that as well as this material.
There is a corresponding video that we've made that includes infrmation about the portal, and also creating a storage account.
"},{"location":"exercises/azure_portal_walkthrough/#orientation-to-the-azure-portal","title":"Orientation to the Azure Portal","text":"The link above is to a video that walks through the description and tutorial steps below, hosted on MSU MediaSpace ( requires MSU Log-in). Note this video also walks through creating a storage account.
This assumes you have an Azure account and a valid subscription. For the purposes of this introduction, we assume that your account currently does not have ability to create a new subscription, resource group,
Log-in to https://portal.azure.com with your MSU Netid. If you are a current member of the fellowship and you have difficulty logging in, please contact us right away. orientation: dashboard view. Azure portal first presents a \"dashboard\" which is organized into panels that show some aspect of your cloud account. You may alter the panels on this dashboard to show you the services and aspects of azure that are most important to you. For information on how to create customize your dashboard, see \"Create a dashboard in the Azure portal.\" In the standard, default version of the dashboard the first panel is a list of resources. If you have not created any resources yet you won't see anything. We will explorer resources later in this introduction. The standard dashboard panes are a list of your current resources (which may be in multiple resource groups), an advertisement with a link to learn about some new Azure service, and more links to create things the Azure has decided are most important to you. We will focus on the \"All Resources Pane\" If you click on anything here you can almost always use the back button to get back to the dashboard, or use the menu (described below) Top Bar Menu: the top menu ( three horizontal bars) is are links to many of the things also on the main dashboard. The \"home\" view is not the same as the dashboard but is a list of links to things Azure guess you may want to create, and a list of all of your resources. If you click \"resource groups\" in this list, you should see only one resource group (if any) unless you've been added to others or a different subscription. Search bar: in the middle of the top of the screen is white box in which you can type search terms include the kind of resource you want to see or create, or part of the name of specific resource you've created. This is what I use to create and find resources most of the time (and rarely use the links provided), more on that later. Shortcut buttons: the next few icons are short cuts to other functionality in the portal that we will cover in the future. Most are not critical.
A note about portal navigation: When you click anything in the portal, it creates a new window without reloading the browser and with an X at the top right. This mimics a \"close window\" function and You can use the X return to the dashboard, or you may simply use the menu and go to where you need to
Notice that like most things there are 4-5 ways to get to anywhere.
"},{"location":"exercises/azure_portal_walkthrough/#bonus-what-can-you-do-here","title":"Bonus: What can you do here?","text":"The primary purpose of using the portal and your resource group is to create things, and manage and monitor those things. For the purpose of this activity - since you don't really have anything - we can simply look at the 'activity log' in the left side-panel near the top. - this opens a new table of columns Operation name, Status, Time, Time stamp, etc that is probably empty for you. - Tables of information like this in the portal have filters at the top. The default activity is just for the previous 6 hours. If you click on the Find that filter called \"timespan\" and select 1 week (or longer) you can see when I created the resource group and the budget.
"},{"location":"exercises/azure_portal_walkthrough/#next-steps-create-a-storage-account","title":"Next Steps: Create a Storage Account","text":"For a good follow up exercise, see Creating a Storage Account with the Portal
"},{"location":"exercises/azure_portal_walkthrough/#about-portal-resource-pages","title":"About Portal \"Resource\" Pages","text":"Most cloud resources in the portal have a list of categories on the left side, and pages for each category in the center. The first page is the \"Overview\" which has the resource group, subscription, and other info important for that resource. this followed by the \"Activity Log\" showing how the resource has been used. Each of the following items on the left side is a new page of additional options to alter how the resource is configured. For example if you click the \"tags\" section you see the tags you added (if any) and can modify or add new tags.
Some of the options are not available on the forms when you create the resource, or the names of the options on these resource pages do not match the forms when you created the resources. In that case you may have to use two steps to configure the resource as you like, or better consider using a programmatic interface
Again we did not discuss any of the characteristics of cloud storage or how to use it but you should now have enough familiarity with the azure portal to follow other tutorials to create and use storage or other resources.
"},{"location":"exercises/azure_vm_walkthrough/","title":"Exercise: Creating and Connecting to a Virtual Machine (VM) for both Windows and Linux","text":""},{"location":"exercises/azure_vm_walkthrough/#about","title":"About","text":"This is an exercise and introduction to creating Virtual Machines (VMs) and related resources using the Azure Portal.
There are two nearly identical activities, and you only need complete one of them:
creating a Windows virtual machine and connecting with a graphic interface (GUI), namely Remote Desktop (rdp) to demonstrate how you may use full graphic software (like Rstudio, Matlab, etc) on a cloud computer creating a Linux Virtual machine and connection with the command line to demonstrate how you may use a terminal interface (or scripting) on a cloud computer. We will use a pre-configured virtual machine with software already installed for both versions. When creating a VM you can use an Azure template and there are many of these. The Data Science Virtual Machine (DSVM) from Azure has R, Python and many data science and statistical libraries available. For more information about the Azure DSVM see https://azure.microsoft.com/en-us/services/virtual-machines/data-science-virtual-machines/ and for the list of tools installed, see https://docs.microsoft.com/en-us/azure/machine-learning/data-science-virtual-machine/tools-included Azure has a new product called \"Azure Machine Learning\" that we may cover in a future session.
"},{"location":"exercises/azure_vm_walkthrough/#requirements-for-both-activities","title":"Requirements for both activities","text":"You need an Azure account with an active subscription, and a resource group of your own to work in. Fellows have these things provided.
This exercise assumes you understand how to use the Azure Portal, which is covered in the Azure Portal Walkthrough. In addition it's helpful to know what a virtual machine is but it's not crucial to complete the exercise. For more information on VMs see the readings session 2
It's helpful to have basic understanding of the \"Client-Server\" model of computing as the VM we create will be running servers (remote desktop server for Windows, and ssh command line server for Linux)
Finally we find that there are many layers of concepts related in this exercise related to IT Infrastructure, and we are happy to provde clarification as needed.
"},{"location":"exercises/azure_vm_walkthrough/#optional-video-walk-through","title":"Optional Video Walk-through","text":"There was a previous exercise that created a Windows Virtual Machine only (from 2021). The following video is based on the exercise. Watching and following the video is not necessary to complete the exercise on this page, and it does not cover linux. However if you find videos more helpful, or would like to see in detail how it works, please take advantage of this walk-through:
Link to Video for the Windows version of exercise. On mediaspace.msu.edu which requires an MSU log-in and is only availavble to participants in the MSU Cloud computing fellowship
The materials that the video follows from 2021 are here, only use those if you need to, otherwise please continue below.
"},{"location":"exercises/azure_vm_walkthrough/#creating-a-windows-virtual-machine","title":"Creating a Windows Virtual Machine","text":"This section is based on Windows, and is recommended for everyone as it is the easy way to connect to remote machine. For an equivalant exercise based on Linux, scroll down. If at any point, or if you are exploring, you can't seem to get the configuration correct (or there is a validation error you can't fix), starting over will not create any resources or incur charges. Go back to step 1 below.
"},{"location":"exercises/azure_vm_walkthrough/#requirements-for-windows-vms","title":"Requirements for Windows VMs","text":"To connect to a Windows VM desktop, it's recommend you use the Microsoft Remote Desktop client.
MacOS : install the Microsoft Remote Desktop Client, only available on the App Store: https://apps.apple.com/app/microsoft-remote-desktop/id1295203466?mt=12 Linux users install http://xrdp.org/ Windows Users should have the remote desktop client, but to ensure you do: In the search box on the taskbar, type Remote Desktop, and then select Remote Desktop Connection. "},{"location":"exercises/azure_vm_walkthrough/#1-selecting-the-resource-template","title":"1. Selecting the Resource Template","text":"In the Azure Portal open the top left menu, and click \"+ Create a resource\" option (the first option)
In the create resource search box, type \"data science virtual machine\" and press enter to search. It will present you with some of the suggested options as you type but please search.
In the options select Data Science Virtual Machine - Windows 2022 (preview) ** (note there is also a 2019 version but they seem very similar and will both work for this exercise )
Click \"Create\" ( note: do not click the \"start with a pre-set configuration\" option )
"},{"location":"exercises/azure_vm_walkthrough/#2-configure-the-vm-using-the-azure-portal","title":"2. Configure the VM using the Azure Portal","text":"The resource creation forms work as described in the Azure Portal but since we used a pre-set configuration some of the values will be completed.
"},{"location":"exercises/azure_vm_walkthrough/#basics","title":"Basics","text":" Subscription should be \"MSU Cloud Computing Fellowship\"
Resource Group should be your CF resource group (the one with your netid). The default is to create a new resource group but participants aren't allow to create their own group, you must select the resource group with your netid.
Virtual machine name Name: You could name it anything that is unique in the region you choose, but to help keep track of your resources, I strongly suggest using a name that includes your netid and the purpose of this VM: dsvm-netid-exercise2
one option is to combine the project or activity (e.g. ), your net id, and some description of what you are doing. In the name above, replace \"netid\" with your own MSU netid. - Note that different resources have different naming restrictions. For example VMs the rules are \"can be almost anything, but Azure resource names cannot contain special characters \\/\"\"[]:|<>+=;,?*@&, whitespace, or begin with '_' or end with '.' or '-' \" - Note if you have an existing VM with this name, add a number 2 or other suffix. We will delete this VM and create something more suitable in the future.
Region Select any region or use the deafult. In the future, when creating resources (like VMs) that access your storage account, should use the same region as that. For this exercise it doesn't matter as we will be deleting these resources. For me, the default was \"(US) Central US\"
Availability Options Leave the default (\"Availability Zone
\")
Availability Zone Leave the default ( \"Zones 1
\")
Security Type You must change this value to \"Standard
\" security type. The \"Trusted\" security option is for servers or production machines.
Image should be \"Data Science Virtual Machine - windows...\" if this is changed you may have to re-enter some info again. VM Architecture Leave as the default x64 (Intel compatible) Azure Spot Instance leave unchecked. Size You can leave the size that is currently selected. NOTES: This is how you select the specifications for CPU and memory. The size you for this exercise doesn't matter for the outcome, but it will show prices which may be interesting. If you click this drop-down menu you may see some other sizes and prices. The Monthly price assumes 24 hour/day operation. Your price to experiment will often be less than $1.00
Administrator Account Just like you need to log-in to your own computer, you must create a user account for the VM. Select a User name and account that you will easily remember, because you will need it to log-in to the new VM.
username : use any user name you will easily remember. I use my netid so I can always remember. password : this must be a complex password, but use something you can remember, or copy/paste from another program. Do not use your MSU password or any other passwords you use Licensing Unlike Linux, Windows requires a license, and this option are for organization with an arrangement with Azure. Leave this box unchecked and Azure will add the extra charges (a few cents per hour) for the use of Windows. If you use a Windows VM for your research, you may be able to use an MSU license. "},{"location":"exercises/azure_vm_walkthrough/#disks","title":"Disks","text":"Leave these as the defaults.
I want to point out one option that \"Delete with VM.\" We will talk about this in the future, but for now it's like purchasing the computer and the disk inside separately. In this case we are just testing so we will delete everything after we are done, but in practice there are reasons for keeping the disk around after you delete the VM so you may sometimes want to uncheck this box
"},{"location":"exercises/azure_vm_walkthrough/#networking","title":"Networking","text":"You must create a 'virtual network' for you VM to be connected to (note historically Azure created this for you). Click \"create new\" to open the new network form.
Create Virtual network
enter a Name : suggest adding \"-vnet\" to your proposed VM name. For me this was \"dsvm1-billspat-ccf23-vnet leave all the rest of the settings as-is with the defaults click the [OK] button at the bottom of this form. "},{"location":"exercises/azure_vm_walkthrough/#other-settings","title":"Other Settings","text":"For this exercise we'll be using the default values for almost all the pages except for Basics page. However you are encouraged to look through these options to see what is involved in creating a virtual machine. The Azure VM documentation covers many of them. For example a VM requires several networking components. The good news is that Azure will name and create these for you, which will see.
"},{"location":"exercises/azure_vm_walkthrough/#tags","title":"Tags","text":"For this exercise, using tags will be essential for identifying which components go to which VM. If you need more information see session 2 page for readings about tags. On the tags section, do the following:
Click \"tags\" in the top row of options (just before 'review and create') In the first row, For Name, type activity
and for the Value type session2 vm
or similar unique value. Optionally, in an additional row, create another tag with Name created by
and for Value put your netid. This kind of tag can be essential when you are sharing cloud accounts with other members of your work group, so that others in your group may identify who created the resources. "},{"location":"exercises/azure_vm_walkthrough/#review-and-create","title":"Review and Create","text":" click \"review and create\" at the bottom of the screen. If there are errors the form name will have a red dot next to it. Go back to that form and see what may be the issue.
If the Validation passed, it will display the approximate hourly cost to use this VM. Mine says 0.1920 USD/hr
Click \"Create\" and the deployment will start. It will take at most 15 minutes.
Now, please skip down the the Viewing VM Resources section below
"},{"location":"exercises/azure_vm_walkthrough/#optional-creating-a-linux-virtual-machine","title":"Optional: Creating a Linux Virtual Machine","text":"This sections is nearly identical to the section above with Windows, but uses Ubuntu Linux, and does not use a graphical interface (although with some work this is possible).
"},{"location":"exercises/azure_vm_walkthrough/#requirements","title":"Requirements","text":"To connect to Linux you need an terminal or command line interface with an ssh
client software. If you have used the MSU HPC, this is the same method for connection.
On Mac, the Terminal.app has ssh On Modern version of Windows, the cmd.exe command prompt has an ssh
command built in Linux desktop/laptops come with an ssh client "},{"location":"exercises/azure_vm_walkthrough/#creating-a-linux-virtual-machine","title":"Creating a Linux Virtual Machine","text":"If at any point, or if you are exploring, you can't seem to get the configuration correct (or there is a validation error you can't fix), starting over will not create any resources or incur charges. Go back to step 1 below.
"},{"location":"exercises/azure_vm_walkthrough/#1-selecting-the-resource-template_1","title":"1. Selecting the Resource Template","text":"In the Azure Portal open the top left menu, and click \"+ Create a resource\" option (the first option)
In the create resource search box, type \"data science virtual machine\"
In the options select Data Science Virtual Machine - Ubuntu 20.04
"},{"location":"exercises/azure_vm_walkthrough/#2-configure-the-vm-using-the-azure-portal_1","title":"2. Configure the VM using the Azure Portal","text":"The resource creation forms work as described in the Azure Portal but since we used a pre-set configuration some of the values will be completed.
"},{"location":"exercises/azure_vm_walkthrough/#basics_1","title":"Basics","text":" The Subscription should be \"Cloud Computing Fellowship\" and resource group should be your CF resource group (with your netid).
Virtual machine name Name: must be unique in the region. I suggest using your netid to name it, and add abbreviations for what you are creataing and for which activity. For example dsvm1-netid-ccf23 Use your actual NetId , for example \"dsvm1-billspat-ccf23\"
Note that different resources have different naming restrictions. For example VMs the rules are \"can be almost anything, but Azure resource names cannot contain special characters \\/\"\"[]:|<>+=;,?*@&, whitespace, or begin with '_' or end with '.' or '-' \"
Note if you have an existing VM with this name, add a number 2 or other suffix. We will delete this VM and create something more suitable in the future.
Region You may select \"(US) North Central US\" or any other US-based region. Availability Options select \"No infrastructure Redundancy required\" this option is for critical infrastructure that needs to withstand a serious outage (e.g. if a hurricane affects a data center). You may also see an \"availability zone\" option appear (perhaps with an error message \"The value must not be empty\"). Selecting \"\"No infrastructure Redundancy required\" in the availability zone will remove the \"availability zone\" field and error message. Security Type Leave as 'standard' Image should be \"Data Science Virtual Machine - Unbuntu..\" if this is changed you may have to select it again from the list. Any Linux image is fine for this tutorial as VM Architecture leave as x64 (Intel processor compatible) Run with Azure Spot discount leave unchecked. Size You can leave the size that is currently selected, which is based on the pre-set configuration from the previous step. This is how you select the specifications for CPU and memory. The size you for this exercise doesn't matter for the outcome, but it will show prices which may be interesting. If you click this drop-down menu you may see some other sizes and prices. The Monthly price assumes 24 hour/day operation. Your price to experiment will often be less than $1.00 Click \"see all sizes\" if you are feeling adventurous -- there are maybe 100 options. (click the [x]
in upper right to close the size selector window) Administrator Account Just like you need to log-in to your own computer, you must create a user account for the VM. Authentication Type For the purpose of this exercise, select \"password\" SSH Keys are strongly recommened but to keep this simple we will use a password. UserName Select a User name and account that you will easily remember, because you will need it to log-in to the new VM. You can use your MSU NetID for your username so it's easy to remember.
password : something you can remember, but is complex to be secure. Do not use your MSU password or any other passwords you use
"},{"location":"exercises/azure_vm_walkthrough/#disks_1","title":"Disks","text":"You can leave the defaults for this page.
"},{"location":"exercises/azure_vm_walkthrough/#networking_1","title":"Networking","text":"You must create a 'virtual network' for you VM to be connected to (note historically Azure created this for you). Click \"create new\" to open the new network form.
Create Virtual network
enter a Name : suggest adding \"-vnet\" to your proposed VM name. For me this was \"dsvm1-billspat-ccf23-vnet leave all the rest of the settings as-is with the defaults click the [OK] button at the bottom of this form. "},{"location":"exercises/azure_vm_walkthrough/#other-options","title":"other options","text":"For this exercise we'll be using the default values for almost all the pages, except for 'Basics' , 'Networking' and . However you are encouraged to look through these options to see what is involved in creating a virtual machine. The Azure VM documentation covers many of them. For example a VM requires several networking components. The good news is that Azure will name and create these for you, which will see.
"},{"location":"exercises/azure_vm_walkthrough/#tags_1","title":"Tags","text":"Using the Azure portal to create VM creates several resources (up to 12). Using tags will be essential for identifying which components go to which VM. This is the metadata associated with these resources. I suggest using a tag like \"activity\" to indicate which of our activities was used to create these resources.
Click \"tags\" in the top row of options (just before 'review and create') In the first row, For Name, type activity
and for the Value type session2
click \"review and create\" "},{"location":"exercises/azure_vm_walkthrough/#review-and-create_1","title":"Review and Create","text":"If there are errors the form name will have a red dot next to it. Go back to that form and see what may be the issue.
If the Validation passed, it will display the approximate hourly cost to use this Linux VM. Mine says 0.0730 USD/hr
Click \"Create\" and the deployment will start. It will take at most 15 minutes.
Linux Users continue to the next section
"},{"location":"exercises/azure_vm_walkthrough/#viewing-vm-resources-in-your-resource-group-windows-and-linux","title":"Viewing VM Resources in your Resource group (Windows and Linux)","text":"You have a few options now. You can wait for the deployment to complete in the portal. When it's ready, the Azure portal will display a message and a link to \"go to resource.\"
However you can also go to the page that lists the items in your resource group to find and explore while the deployment is in progress.
Open your resource group in the portal: click the portal menu on the top left, and select \"resource groups\" From the list, select your CF21 group. When the deployment is finished, you should see several new resources They will have the same name prefix \"CF21netid-dsvm\" but may have a suffix indicating the kind of resource (e.g. CF21-netid-dsvm1-ip The second column is the \"type\" which helps identify what they are click for a large view in a new tab/window
Select the item with type \"virtual machine\" and click on the name to open its resource page (for example, cf21-billspat-dsvmtest item in the screenshot above) "},{"location":"exercises/azure_vm_walkthrough/#the-vm-resource-page","title":"The VM Resource Page","text":"To see the details for your virtual machine, click the VM in your resource group if you haven't already.
click for larger view
Note that the Azure portal will show a few errors/warnings if the deployment is not complete. You may see a warning that the 'agent' in the VM is not working, but you can ignore it. It will go away when the VM configuration is complete.
There are many details here but some immediate things to notice:
in the top row are buttons to connect, start, restart and stop the vvm. in the top, \"essentials\" section the \"status\" should be \"running.\" on the right side is the assigned IP address which you need to connect. If you are connecting with RDP, then then RDP file has this address in it so you don't need to remember it. However this is the IP address you can use to connect directly from your Remote Desktop client or the SSH client. For now just need to know that this IP address is here on the main VM page. Note that, if you click the link on the address, it will take you to a new resource page just for the IP address (which is a distinct resource assigned to this VM resource) "},{"location":"exercises/azure_vm_walkthrough/#connecting","title":"Connecting","text":""},{"location":"exercises/azure_vm_walkthrough/#connecting-to-a-windows-vm-using-remote-desktop-protocol-rdp-client","title":"Connecting to a Windows VM using Remote Desktop Protocol (RDP) client","text":"You may connect to this VM running the Windows operating system with either graphical desktop, a command line connection, or both.
Every VM created in Azure has an \"IP Adress\" or internet address, and we use this to connect to.
The following Azure documentation describes how to connect to a Windows VM: https://docs.microsoft.com/en-us/azure/virtual-machines/windows/connect-logon
Here are more detailed instructions:
There is a 'connect' link on the left side in the \"Settings\" section of the left menu.
The connect pane looks something like this:
Connect with RDP (remote desktop protocol) is a Microsoft method for connecting to the graphical desktop. For Mac/Linux requires additional software (mentioned at the beginning of this page).
Step 1: In the Azure portal:
click \"connect\" on the left side menu if haven't already in the \"native rdp\" box, click \"select\" optional: if the machine is still deloying or turned off, you may get a warning that the machine is stopped. click start VM. a new pane displays, that may look like this: it may take a few seconds for Azure to configure the VM to use RDP, with the message that \"Auzre is configuring... \"When it's working on it, you will see it say \"validating\" in a gray box. Some users found that it never finished! However the VM is still available for a connection. You may wait for the grey \"validating\" button to change to \"configured\" but if it does not appear to be completing, please move on to the next step anyway. click \"download RDP file\" button and save the .rdp
file anywhere on your computer that you find it again Step 2:
after it's downloaded, find the .rdp
file and double click to open it which should start your remote desktop software. Mac users must have installed the Microsoft Remote Desktop client app ignore any security or error messages, click \"connect\" Alternatively you may also open your RPD software, create a new connection, and copy the IP address listed in the portal, in the Azure VM. and paste the IP address that is listed on the resource page for the VM.
Here is what the Windows screen may look like:
This is because we are using a temporary certificate but it is secure. Click \"Yes\"
Step 3. Enter the Username and password you used when configuring the VM in the \"Basics\" section above. For some versions of Windows, you need to click \"More choices\" in the Windows Security menu, otherwise the default is often your Microsoft or your laptop account Enter the user id and password you used when you created the VM. If the user account you entered does not work, you may have to put your user account in domain\\username form, and in this case, the domain is the name of the virtual machine and it is entered as vmname\\username, with a back-slash in-between, and with the same password. Starting up the VM Once you connect for the first time, the Windows VM will provision the VM user account and will install things during and after start-up. Feel free to close any windows. Once the installations are finished, you may use the machine as you would any other windows computer. You can start Jupyter notebooks to work with Python. Previous version of the Azure Data Science Virtual Machine has Rstudio installed on it, but the latest version only seems to have the base R interface.
We will cover how to transfer code and files to a VM in a later session. If you are comfortable with using the command line, you can use git clone...
to download code to run.
Explore to see what is already pre-installed on this VM. If you start with a standard version of Windows, you will have to install your own software.
When you finished with your remote session you may simply close the remote windows (leaving the VM running. See below for how to turn it off and delete it.
Optional: Connect to the Windows DSVM with ssh
NOTE In 2023 the 'connect' option in the Azure portal has a button beneath the RDP section that says \"more ways to connect.\" Inside this is a \"native ssh\" section. This only have instructions for how to connect with SSH. there is no special file to download like RDP.
This windows machine has an SSH Server running, and the security settings from the pre-configured version allow connections from SSH. If you are familiar with ssh and the command line, you may start the CMD.EXE on your windows computer, or the Mac Terminal, and enter ssh <username>@<ipaddress>
Where the username is the user you put for your VM when you created it, and the Public IP address is listed on the VM Resource page.
This is similar to how you connect to the MSU HPC, if you are HPC user.
You will be asked to add the host to your list hosts, and enter the password you used when you created the VM.
When you log-in you will be connected to the Windows command prompt (e.g. C:\\Users\\username>
To Exit, type exit
at the command prompt.
Next Steps: For information on turning off the VM and for eventually deleting the VM, scroll down below the Linux section as these operations are the same in the Azure portal for Linux or Windows virtual machines.
"},{"location":"exercises/azure_vm_walkthrough/#connecting-to-a-linux-vm-using-ssh","title":"Connecting to a Linux VM using SSH","text":"We will connect and use this remote VM running the Linux operating system with a command line connection. It is possible to use a graphical connection but requires additional setup beyond the scope of the short exercise.
In addition this assumes you have some familiarity with using the command line and starting your terminal program.
There is a 'connect' link above the 'essentials' list, and a connect link on the left side - they both go to the same place.
Connect with SSH
this is the standard method of connecting with ssh, but we've included as much detail as possible for those who are new to using ssh.
On the main \"overview\" page of the VM resource, find the \"Public IP Address\" on the top right side. Copy this IP address to the clipboard, or make a note of it. Mine was 20.98.28.63. Note that these VMS also have an internal IP address that start with 10.x.x.x that will not work for connecting from your laptop. Use the Public IP address. not all VMs have a public IP address but this one will. also make a note of the User ID and password you used to create the VM above side note, in the \"connect\" form of the VM resource pages, it describes how to use an ssh key, even though we did not create an ssh key when we created a VM. If you did not create an ssh key, you do not need to follow these instructions.
on your desktop/laptop, start your terminal program on MacOS/Linux or cmd.exe
if you using Windows.
Enter the command as displayed, which is something like ssh vmusername@vmipaddress
In my case, my command is ssh patbills@20.98.28.63
If this is the first time connection, you'll get the standard ssh warning \"The authenticity of host '20.98.28.63 (20.98.28.63)' can't be established.\"
simply say \"yes\" and enter Enter the password you used when configuring the VM in the \"Basics\" section above. (note that ssh does not show any key movement or * when you type a password) it takes a while to connect for the fist time as the VM configures software and prepares your user account You may use the machine as you would any other linux computer. For more information about what software is installed, see We will cover how to transfer code and files to a VM in a later session.
When you finished with your remote session you may simply close the remote windows (leaving the VM running. See below for how to turn it off and delete it.
"},{"location":"exercises/azure_vm_walkthrough/#starting-and-stopping-the-vm-both-windows-and-linux","title":"Starting and Stopping the VM (both Windows and Linux)","text":"There are three ways to \"stop\" or turn off a VM. 1. when connected to it, e.g. in the remote desktop, use Windows to turn it off. The VM is then \"stopped.\" In a Linux ssh session you may use a command like sudo shutdown -h now
When the Operating system is shut off, and hence tthe VM is not running, but it is still \"allocated.\" When you turn it back on, it will come on immediately. 1. Use the Azure portal to \"stop\" the VM which shuts down Windows (gracefully if possible) and 'deallocates' the VM. Restarted the VM appears to be the same process, but Azure must allocate resources first to run it, then power it up. This is cheaper then the first method in the long run 1. Delete it.
"},{"location":"exercises/azure_vm_walkthrough/#stopping-deallocating-the-vm-with-the-portal","title":"Stopping (deallocating) the VM with the Portal:","text":" Go to the resource page for the VM, if you are not already. If you are just entering the portal, find your resource group, find the VM in your resource group (identified as a VM in the \"type\" column of the list of resources), and click to open the resource page. The Status field near the top of this screen will indicate running or stopped. Find the Start and Stop buttons near the top of this screen and click \"stop\" if the machine is running. There is a warning about losing your IP address, with a check box to reserve it. If you plan on deleting the VM now, click \"ok\" If you plan on restarting the VM and reconnecting, first check the box \"reserve the IP\" then click OK The default is to use a \"dynamic\" address which is assigned every time you turn on the VM When using a dynamic address, you must copy/paste the ip address, or re-download the RDP connection file everytime you restart the machine the solution is to use a \"Static IP\" either when you create the VM, or assigning one after the VM is created. and checking the box does so. you can also convert to a static IP with the portal, but it is not a straightforward process, see https://docs.microsoft.com/en-us/azure/virtual-network/virtual-networks-static-private-ip-arm-pportal Pricing for a static ip is here: https://azure.microsoft.com/en-us/pricing/details/ip-addresses/ which as of now is $0.0036/hour which is charged even if the VM is turned off. That is approx $2.70/month It's a good idea to leave VMs in a \"stopped (deallocated)\" state if you are not using them for computations or providing a service, just as you would turn off or put your laptop to sleep. The main reason for this is for security. "},{"location":"exercises/azure_vm_walkthrough/#deleting-the-resources-both-windows-and-linux","title":"Deleting the Resources (both Windows and Linux)","text":" Open the Resource group as above When creating resources using the template as we did above, the resources associated with this VM will all start with the same prefix, so they are easy to identify. Select them with checkboxes, and click the \"Delete\" button which is on the top right of the screen (not the \"delete resource group\" button) If it's not obvious which resources are all included, you may also use the \"tag\" you created to filter what is listed and only show those with the same \"tag.\" For more information see https://docs.microsoft.com/en-us/azure/azure-portal/manage-filter-resource-views . If you add filter on tag, then you may select all the items that are shown, and delete those. after selecting confirm the deletion by typing \"yes\" Creating resources just to delete them may seem wasteful however we will cover how to save a \"snapshot\" and/or \"image\" of your VM's disk so that you may re-use any work to install and configure software withtout incurring charges.
"},{"location":"exercises/azure_vm_walkthrough/#more-references","title":"More References","text":"Azure has very abbreviated versions of this exercise if you would like another perspective. They assume you can create your own resource group (which you don't have the ability to do currently in the fellowship)
https://docs.microsoft.com/en-us/azure/machine-learning/data-science-virtual-machine/overview#next-steps
**Data Science Use Case Tutorials from Azure: **
Windows: This tutorial uses products that Azure no long supports, and for Windows users they really push to use their \"Azure Machine Learning\" product. However the Windows DSVM offers a really fast way to get access to a windows desktop graphical interface https://docs.microsoft.com/en-us/azure/machine-learning/data-science-virtual-machine/vm-do-ten-things
Linux: https://docs.microsoft.com/en-us/azure/machine-learning/data-science-virtual-machine/linux-dsvm-walkthrough
If you follow these, just remember to delete the resources you create when you are done exploring
Return to the Session 2 page
"},{"location":"exercises/azure_windows_vm_walkthrough/","title":"Exercise: Creating a Windows Virtual Machine (VM)","text":"This is a previous version of a windows-only VM walk through from 2020, kep for historical reasons
Please use our updated version that covers both Windows and Linux
Link to Video for this exercise on mediaspace.msu.edu (requires log-in)
"},{"location":"exercises/azure_windows_vm_walkthrough/#about","title":"About","text":"This is an exercise and introduction to creating Virtual Machines (VMs) and related resources using the Azure Portal. This exercise assumes you understand how to use the Azure Portal, which is covered in the Azure Portal Walkthrough. In addition it's helpful to know what a virtual machine is but it's not crucial to complete the exercise. For more information on VMs see session 2).
We will use a pre-configured virtual machine with software already installed. When creating a VM you can use an Azure template and there are many of these. The Data Science Virtual Machine (DSVM) from Azure has R, Python and many data science and statistical libraries available. For more information about the Azure DSVM see https://azure.microsoft.com/en-us/services/virtual-machines/data-science-virtual-machines/
"},{"location":"exercises/azure_windows_vm_walkthrough/#requirements","title":"Requirements","text":"You need an account in azure with an active subscription, and a resource group of your own to work in. Fellows have these things provided.
"},{"location":"exercises/azure_windows_vm_walkthrough/#creating-and-connecting-to-a-windows-virtual-machine","title":"Creating and Connecting to a Windows Virtual Machine","text":""},{"location":"exercises/azure_windows_vm_walkthrough/#requirements_1","title":"Requirements","text":"To connect to a Windows VM desktop, it's recommend you use the Microsoft Remote Desktop client.
MacOS : install the Microsoft Remote Desktop Client, only available on the App Store: https://apps.apple.com/app/microsoft-remote-desktop/id1295203466?mt=12 Linux users install http://xrdp.org/ Windows Users ensure you have the client : In the search box on the taskbar, type Remote Desktop, and then select Remote Desktop Connection. "},{"location":"exercises/azure_windows_vm_walkthrough/#creating-a-windows-virtual-machine","title":"Creating a Windows Virtual Machine","text":"If at any point, or if you are exploring, you can't seem to get the configuration correct (or there is a validation error you can't fix), starting over will not create any resources or incur charges. Go back to step 1 below.
"},{"location":"exercises/azure_windows_vm_walkthrough/#1-selecting-the-resource-template","title":"1. Selecting the Resource Template","text":"In the Azure Portal open the top left menu, and click \"+ Create a resource\" option (the first option)
In the create resource search box, type \"data science virtual machine\"
In the options select **Data Science Virtual Machine - Windows 2019 **
The \"Plans\" section has a description of the template if you would like to know more.
Click the \"start with a pre-set configuration\" option.
"},{"location":"exercises/azure_windows_vm_walkthrough/#2-select-the-pre-set-configuration","title":"2. select the pre-set configuration","text":"These configurations help to select your VM size based on your activity. We will use the default options and click \"Continue to create a VM\"
The options do not affect the outcome of the exercise so at this step explore each option
Click \"Continue to create a VM\"
"},{"location":"exercises/azure_windows_vm_walkthrough/#3-configure-the-vm-using-the-azure-portal","title":"3. Configure the VM using the Azure Portal","text":"The resource creation forms work as described in the Azure Portal but since we used a pre-set configuration some of the values will be completed.
"},{"location":"exercises/azure_windows_vm_walkthrough/#basics","title":"Basics","text":" The Subscription should be \"Cloud Computing Fellowship\" and resource group should be your CF resource group (with your netid). As we create additional resource groups for this
Virtual machine name Name: CF21-netid-dsvmtest One option is to combine the project (e.g. the fellowship), your net id, and some description of what you are doing. In the name above, replace \"netid\" with your own MSU netid.
Note that different resources have different naming restrictions. For example VMs the rules are \"can be almost anything, but Azure resource names cannot contain special characters \\/\"\"[]:|<>+=;,?*@&, whitespace, or begin with '_' or end with '.' or '-' \"
Note if you have an existing VM with this name, add a number 2 or other suffix. We will delete this VM and create something more suitable in the future. 1. Region Select \"(US) North Central US\" 1. Availability Options select \"No infrastructure Redundancy required\" this option is for critical infrastructure that needs to withstand a serious outage (e.g. if a hurricane affects a data center). You may also see an \"availability zone\" option appear (perhaps with an error message \"The value must not be empty\"). Selecting \"\"No infrastructure Redundancy required\" in the availability zone will remove the \"availability zone\" field and error message. 1. Image should be \"Data Science Virtual Machine - windows...\" if this is change you may 1. Azure Spot Instance leave unchecked. 1. Size You can leave the size that is currently selected. This is how you select the specifications for CPU and memory. The size you for this exercise doesn't matter for the outcome, but it will show prices which may be interesting. If you click this drop-down menu you may see some other sizes and prices. The Monthly price assumes 24 hour/day operation. Your price to experiment will often be less than $1.00 1. Administrator Account Just like you need to log-in to your own computer, you must create a user account for the VM. Select a User name and account that you will easily remember, because you will need it to log-in to the new VM. * username : use any user name you will easily remember, perhaps your netid * password : something you can remember, but is complex to be secure. Do not use your MSU password or any other passwords you use 1. Licensing Unlike Linux, Windows requires a license, and this option are for organization with an arrangement with Azure. Leave this box unchecked.
"},{"location":"exercises/azure_windows_vm_walkthrough/#disks-and-other-settings","title":"Disks and Other Settings","text":"For this exercise we'll be using the default values for almost all the pages except for Basics page. However you are encouraged to look through these options to see what is involved in creating a virtual machine. The Azure VM documentation covers many of them. For example a VM requires several networking components. The good news is that Azure will name and create these for you, which will see.
"},{"location":"exercises/azure_windows_vm_walkthrough/#tags","title":"Tags","text":"Using tags will be essential for identifying which components go to which VM. This is the metadata associated with these resources. I suggest using a tag like \"activity\" to indicate which of our activities was used to create these resources.
Click \"tags\" in the top row of options (just before 'review and create') In the first row, For Name, type activity
and for the Value type session2
click \"review and create\" "},{"location":"exercises/azure_windows_vm_walkthrough/#review-and-create","title":"Review and Create","text":"If there are errors the form name will have a red dot next to it. Go back to that form and see what may be the issue.
If the Validation passed, it will display the approximate hourly cost to use this VM. Mine says 0.1920 USD/hr
Click \"Create\" and the deployment will start. It will take at most 15 minutes.
"},{"location":"exercises/azure_windows_vm_walkthrough/#4-the-resources","title":"4. The Resources","text":"While the deployment is in progress you may explore the operation details or click any of the resources that have been created.
Open your resource group in the portal: click the portal menu on the top left, and select \"resource groups\" From the list, select your CF21 group. When the deployment is finished, you should see several new resources They will have the same name prefix \"CF21netid-dsvm\" but may have a suffix indicating the kind of resource (e.g. CF21-netid-dsvm1-ip The second column is the \"type\" which helps identify what they are click for a large view in a new tab/window
Select the item with type \"virtual machine\" and click on the name to open its resource page (for example, cf21-billspat-dsvmtest item in the screenshot above) "},{"location":"exercises/azure_windows_vm_walkthrough/#5-the-vm-resource-page","title":"5. The VM Resource Page","text":"To see the details for your virtual machine, click the VM in your resource group if you haven't already.
click for larger view
There are many details here but some immediate things to notice:
in the top row are buttons to connect, start, restart and stop the vvm. in the top, \"essentials\" section the \"status\" should be \"running.\" on the right side is the assigned IP address which you need to connect. Highlight and copy and paste this address. If you click the link on the address, it will take you to a new resource page just for the IP address (which is a distinct resource assigned to this VM resource) "},{"location":"exercises/azure_windows_vm_walkthrough/#6-connecting","title":"6. Connecting","text":"You may connect to this VM running the Windows operating system with either graphical desktop, a command line connection, or both.
The following Azure documentation describes how to connect to a Windows VM: https://docs.microsoft.com/en-us/azure/virtual-machines/windows/connect-logon
Here are more detailed instructions:
There is a 'connect' link above the 'essentials' list, and a connect link on the left side - they both go to the same place.
Connect with RDP (remote desktop protocol) is a Microsoft method for connecting to the graphical desktop. For Mac/Linux requires additional software (mentioned at the beginning of this page).
Click \"connect\" and select \"rdp\" if it isn't already. click \"download RDP file\" button and save the .rdp
file anywhere on your computer that you find it again after it's download, and if you Mac users have installed the RDP client, then double click the .rdp
file to open your remote desktop software.
On windows, any security or error messages, click \"connect\"
Alternatively you may also open your RPD software without downloading the RDP file, and copy the IP address listed on and paste the IP address that is listed on the resource page for the VM
When you connect, if the VM is not running, you will get an error message. Here is what the Windows screen looks like:
This is because we are using a temporary certificate but it is secure. Click \"Yes\"
Enter the Username and password you used when configuring the VM in the \"Basics\" section above. you may be able to simply enter the user name and password directly If not, in the Windows Security window, select More choices and then Use a different account. Enter the credentials for an account on the virtual machine and then select OK. If the user account you entered does not work, you may have to put your user account in domain\\username form, and in this case, the domain is the name of the virtual machine and it is entered as vmname\\username, with a back-slash in-between, and with the same password. Once you connect, you may see Windows starting up and installing things. Feel free to close any windows. Once the installations are finished, you may use the machine as you would any other windows computer. If you type Rstudio in the search box, you may launch an Rstudio session on this remote computer. It also has Python, many python libs and Jupyter notebook.
We will cover how to transfer code and files to a VM in a later session.
When you finished with your remote session you may simply close the remote windows (leaving the VM running. See below for how to turn it off and delete it.
Optional: Connect to the Windows DSVM with ssh
This windows machine has an SSH Server running, and the security settings from the pre-configured version allow connections from SSH. If you are familiar with ssh and the command line, you may start the CMD.EXE on your windows computer, or the Mac Terminal, and enter ssh <username>@<ipaddress>
Where the username is the user you put for your VM when you created it, and the Public IP address is listed on the VM Resource page.
This is similar to how you connect to the MSU HPC, if you are HPC user.
You will be asked to add the host to your list hosts, and enter the password you used when you created the VM.
When you log-in you will be connected to the Windows command prompt (e.g. C:\\Users\\username>
To Exit, type exit
at the command prompt.
"},{"location":"exercises/azure_windows_vm_walkthrough/#7-starting-and-stopping-the-vm","title":"7. Starting and Stopping the VM","text":"There are three ways to \"stop\" or turn off a VM. 1. when connected to it, e.g. in the remote desktop, use Windows to turn it off. The VM is then \"stopped.\" The VM is not running, but it is still \"allocated.\" When you turn it back on, it will come on immediately. 1. Use the Azure portal to \"stop\" the VM which shuts down Windows (gracefully if possible) and 'deallocates' the VM. Restarted the VM appears to be the same process, but Azure must allocate resources first to run it, then power it up. This is cheaper then the first method in the long run 1. Delete it.
Stopping (deallocating) the VM with the Portal:
Go to the resource page for the VM, if you are not already. If you are just entering the portal, find your resource group, find the VM in your resource group (identified as a VM in the \"type\" column of the list of resources), and click to open the resource page. The Status field near the top of this screen will indicate running or stopped. Find the Start and Stop buttons near the top of this screen and click \"stop\" if the machine is running. There is a warning about losing your IP address, with a check box to reserve it. If you plan on deleting the VM now, click \"ok\" If you plan on restarting the VM and reconnecting, first check the box \"reserve the IP\" then click OK The default is to use a \"dynamic\" address which is assigned every time you turn on the VM When using a dynamic address, you must copy/paste the ip address, or re-download the RDP connection file everytime you restart the machine the solution is to use a \"Static IP\" either when you create the VM, or assigning one after the VM is created. and checking the box does so. you can also convert to a static IP with the portal, but it is not a straightforward process, see https://docs.microsoft.com/en-us/azure/virtual-network/virtual-networks-static-private-ip-arm-pportal Pricing for a static ip is here: https://azure.microsoft.com/en-us/pricing/details/ip-addresses/ which as of now is $0.0036/hour which is charged even if the VM is turned off. That is approx $2.70/month It's a good idea to leave VMs in a \"stopped (deallocated)\" state if you are not using them for computations or providing a service, just as you would turn off or put your laptop to sleep. The main reason for this is for security. "},{"location":"exercises/azure_windows_vm_walkthrough/#8-deleting-the-resources","title":"8. Deleting the Resources","text":" Open the Resource group as above When creating resources using the template as we did above, the resources associated with this VM will all start with the same prefix, so they are easy to identify. Select them with checkboxes, and click the \"Delete\" button which is on the top right of the screen (not the \"delete resource group\" button) If it's not obvious which resources are all included, you may also use the \"tag\" you created to filter what is listed and only show those with the same \"tag.\" For more information see https://docs.microsoft.com/en-us/azure/azure-portal/manage-filter-resource-views . IF you add filter on tag, then you may select all the items that are shown, and delete those. after selecting confirm the deletion by typing \"yes\" "},{"location":"exercises/exercise_budget_alert/","title":"MSU Cloud Computing Fellowship: Costs and Budgets with Microsoft Azure","text":"(Almost) everything you do in Azure has a cost, and costs for resources often acrue over time, wether the resource is in use or not. This is a short excercise to recieve an email when you have spent a certain amount of money. This can be valuable if you are experimenting and forget to delete a resource that you no longer need.
For this work, You must first have a 'budget' in your resource group. We created a budget for 2022 for all fellowship participants that you can use for creating alerts.
If you have not yet, please go through the \"Intro to the Azure portal\" exercise for more context about what we are doing.
"},{"location":"exercises/exercise_budget_alert/#background","title":"Background","text":"See the \"costs\" section in the topics for details.
In Azure you can set a 'budget' for a single resource (like a virtual machine), your whole resource group, or we could set on for the whole fellowship. However setting a budget doesn't stop you from spending anything or invoke any action.
Once you set a budget or maximum dollar amount you'd like to spend, you need to to then add either actions or alerts when some threshold within that budget is reached.
We have set budgets for your resource group in the fellowship. However you need to now set an alert to send you an email when you reach a spending amount. You can set multiple alerts. for example, we will set an alert when you reach a certain threshold.
For details about this service, see the Azure Cost Management + Billing documentation. This specific exercise works with budgets and assumes there is one in your resource group. If you do not have a budget on your account, or if you'd like to create a new kind of budget please contact us and we will assist you. However if you are comfortable with Azure concepts, see this advanced [Azure Budget Tutorial
https://learn.microsoft.com/en-us/azure/cost-management-billing/costs/tutorial-acm-create-budgets
"},{"location":"exercises/exercise_budget_alert/#steps-to-add-a-cost-alert-to-an-existing-budget-your-resource-group","title":"Steps to add a \"cost alert\" to an existing budget your resource group.","text":"Find the Premade Budget
Log into https://portal.azure.com You should see a single resource group, or be put into one automatically. Open your resource group if is not already The left side bar had properties for the resource group. In the left side-bar, select \"budgets\" (scroll down) You should see a single budget named with netid, like this \"ccf23_sparty_budget\" Click on that budget click 'edit budget' link near the top left review the information Add an 'alert' to that budget
in the edit budget form, alert condition: type = Actual enter 50 percent under action group, leave it as 'none' (alerts are different from actions) in email, put your preferred email address (I don't know if gmail etc will work) add a second email to inform the instructors for the cloud fellowship: billspat@msu.edu select your preferred language, if it's available (the default is US English) click 'Save' You may add additional alerts if you want to be reminded at different thresholds of spending, e.g. 25%, 50%, 80%. One advantage to setting a low threshold like 20% of your budget is to help you learn how much things cost or to be alerted if there are resources you've created but didn't realize they still existed or were costing anything.
I hope these instructions were clear but again, any questions please contact us using email or MS Teams.
"},{"location":"exercises/exercise_create_storage_account/","title":"Creating a \"Storage Account\" with the Azure Portal","text":"(From: Session 1 - Introduction)
This is a good activity to explore the Azure portal by creating a new resource. Storage accounts do not accrue much cost until you fill them up with data. Please review the exercise Azure Portal Walk-through if you haven't.
We have not talked about Cloud storage, however you don't need to know about Cloud storage to complete this tutorial. This is simply an exercise to see how you would create something using the Azure portal, and Cloud storage is a benign (and very inexpensive) resource to use an example.
Note that a \"storage account\" is not the same as \"disk\" you will see when you create a virtual machine. We will discuss the difference in detail in the session on storage.
"},{"location":"exercises/exercise_create_storage_account/#requirements","title":"Requirements:","text":" An Azure Account with valid subscription A Resource group All members of the current Cloud Computing Fellowship cohort have these things.
"},{"location":"exercises/exercise_create_storage_account/#creating-a-storage-account-step-by-step","title":"Creating a storage account step-by-step.","text":""},{"location":"exercises/exercise_create_storage_account/#first-step-accessing-a-storage-account-template","title":"First Step: Accessing a Storage Account Template.","text":" Log-in to the Azure portal if you have not already. (https://portal.azure.com) Click the menu (top left, three horizontal bars). Select Home from the menu. (This is to ensure we all have the same view) Select Create a Resource in the upper left screen under Azure Services. Yes we could have click \"storage accounts\" instead but we want to demonstrate how to use the next screen... Note: The current screen is where you can create almost any service Azure offers, and additional services created by third-parties or companies that are not Microsoft. When you are starting, ensure you are creating a service from Microsoft (we'll show you how in the next step) In the lower search bar (labeled Search services and marketplace), type Storage account Note that \"storage\" alone lists many other kinds of resources. You will see a list of several services. Select the first one labeled Storage account (icon looks like a green spreadsheet). Note: The description of the service will say the provider, which should be Microsoft, if not go back using the back button and search for storage account again. Click Create under Storage account. "},{"location":"exercises/exercise_create_storage_account/#second-step-setting-up-the-storage-account","title":"Second Step: Setting up the Storage Account.","text":"Note: The Azure resource creation screens mostly work like this: there are so many settings Azure has split these up into groups which are listed horizontally across the top. You may work though these by clicking each group, OR finish a screen, and click \"Next..\" button on the bottom of the form. At any time you may click \"Review and Create\" and if you've missed some crucial setting, Azure will not let you create the resource without fixing it. We will go page-by-page for these settings
Basics:
Subscription: Cloud Computing Fellowship Resource Group: Select your resource group provided to you. Storage Account Name:
some resources have restrictions on naming. Next to storage account is an \"i\" in a circle that has more information. For storage accounts, they must be unique in region, and only numbers and lowercase letters are allowed. I don't know if Non-US letters are allowed (e.g.\u7bb1) use your MSU ID (NetID) when you name things so help me keep track and also to help find a name that is unique. So, replace \"NETID\" with your MSU NetID here: \"stNETIDccf22\" e.g. stbillspatccf22 If you are repeating this tutorial, simply add a \"2\" or \"B\" e.g. \"stbillspatccf22B\" We can delete these experiments later. Region (Location): Change this location to US Central. Click in here to see the options. In practice, pick the region that is closest to you or where your data will be moving to (e.g. North Central US for MSU) but there are other considerations.
Performance: Standard Redundancy: change from GeoRedundant to \"Locally Redundant\" (LRS). We won't see a difference, and LRS is cheaper. Beneath that, leave the \"make read access....\" box checked. Click next...Advanced Advanced:
Leave all of these settings as-is. Click next... Networking: Leave all of these settings as-is. Click next... Data Protection: Leave all of these settings as is. These settings allow you to recover files up to 7 days after deleting or over-writing. click next... Encryption: Leave all of these settings as is. click next... Tags: Tags are optional but eventually highly recommended. For now you can leave them blank. Review and create review gives you a chance to double check your settings before committing click Create "},{"location":"exercises/exercise_create_storage_account/#third-step-deploying-the-storage-account","title":"Third Step: Deploying the Storage Account","text":" Deployment Azure calls the process of creating cloud resources a \"deployment.\" This term comes from the software engineering process of first \"building\" an application or utility (or \"compiling\" which is often not necessary for scripting languages like Python or R) and then moving that application onto the IT servers that make it available. On your own computer you download software that is already \"built\" (e.g. MS Word) and installing it is a form of deployment. Deployment takes a while as the Azure Resource Manager takes your order and runs the code to generate the cloud resource you've described. You may leave this page and the deployment will continue in the background. Finish and Review
When the deployment is complete, in the top bar of the Azure portal you'll see a number badge on the \"Notification\" icon indicating the number of messages you have (probably just 1). Click on the Notifications icon to show this message. the message should be something like: Deployment succeeded Deployment 'resourcename_12345678901234' to resource group 'group name' was successful. \"Go to Resource\" button will open the Portal page with options for the resource \"Pin to Dashboard\" will create a new tile that is a shortcut to this resource on your dashboard for easy access. If you want to experiment with dashboard arranging then it's ok to click this and easy to remove later from your Portal Dashboard (it will be added to the bottom) Examine Resource (storage)
We have not talked about how storage works but the storage resource page is a good example to learn how the Portal is organized. If you didn't already click \"go to resource\", open the top menu and click \"home\" the Portal \"Home\" has a list of \"recent resources\" and this should be at the top. "},{"location":"exercises/exercise_using_the_cloud_to_summarize_and_visualize_data/","title":"Exercise: using the cloud to summarize and visualize data.","text":""},{"location":"exercises/exercise_using_the_cloud_to_summarize_and_visualize_data/#overview","title":"Overview","text":"The basic task of this project is analyze data in the cloud: copying data and code to the cloud, and using cloud computing to run a basic script, and save the output to cloud storage. We provide the data and the code (in R and Python ) with clear description of how to run it.
The goal is to assess whether the structure of this material was sufficient (did we do our jobs?), that you were able to synthesize it, and hence you as a fellow are ready to take on a cloud project.
The goal is not to determine your ability to run code (which you most like can already do!), use git, use the command line, or to be a systems admin but just to assess what piiece of this small puzzle we may need to reinforce. All steps should be able to be completed without having to write any code at all, except tp run the program. We hope this unified exercise helps fill any gaps in practical and potentially practical understanding of how computing in the cloud works. Or, even better, that it's so easy that it seems like busy work.
"},{"location":"exercises/exercise_using_the_cloud_to_summarize_and_visualize_data/#process","title":"Process","text":"We are here to help along the way, and happy to answer any an all questions. The goal is to not present a step by step tutorial but to provide guidelines for how you should approach the problem. If you have issues it would be very help to us for you to review the course materials to determine if we've provided the information or links to the information to know if we need to augment these materials. However we will aways answer your questions as they come up.
If you review this and find it very easy, you want to use something other than a VM to do calculations, or have code and data of your own you'd like to run, that is great! The goal is to help you accomplish a computation in a way that you may use in your project.
"},{"location":"exercises/exercise_using_the_cloud_to_summarize_and_visualize_data/#output","title":"Output","text":"We ask that you prepare a short, informal description of the resources you used, how you used them to move data and execute code, and the costs associated with those resources. In addition any technical challenges, lack of clear documentation, or any other issues that needed to be overcome to complete this will be helpful to us.
"},{"location":"exercises/exercise_using_the_cloud_to_summarize_and_visualize_data/#data","title":"Data","text":"The data is a simple CSV file of approximately 450,000 weather observations near the MSU campus. Details about the data file and it's origin are documented in the code site linked below. In addition a direct link for downloading the suggested data set will be sent to the fellows in email. While the data is in the public domain, for each download there is a small cost. Hence we are not posting the URL on this public site to prevent bots from repeatedly downloading the file.
"},{"location":"exercises/exercise_using_the_cloud_to_summarize_and_visualize_data/#code","title":"Code","text":"The code we suggest you run is available on Github: https://github.com/msucloudfellowship/msu_ccf_miniproject There is a Python and an R version. The data is not in the github repository, but you should have recieved a link to download it, and there are instructions and code for downloading the data from the source for Lansing or other weather stations.
"},{"location":"exercises/exercise_using_the_cloud_to_summarize_and_visualize_data/#task-details","title":"Task Details","text":"We expect you to create the following elements. If you already have some of these cloud resources, of course it's more efficient to re-use those but we want to get a cost element for all aspects, so we recommend creating a new resources (e..g. a new storage account) for this mini project.
You can use the Azure portal to accomplish many if not not all of these tasks, excpet to run your actual program,
create cloud storage (account, etc) copy data into storage create and start a Virtual Machine (VM) that can run this code. The instructions refer to the Azure data science virtual machine, which we discussed in the session \"how to cloud\" . You may also use container services (e.g. Azure Container Instance) to run this code if you like. hint: consider using tags to uniquely identify resources you are creating for this project to easily identify all resources used for 1) cost analysis 2) deleting connect and log-in to the VM, and get the scripts into the machine, install software as needed copy the data from storage to the virtual machine disk, by attaching the storage to the compute service and access via that connect or otherwise copy the data (hint: the DSVM comes with the Azure storage explorer installed) run script while pointing to the data file location this will output images of plots (PDF or PNG formatted) save output files to cloud storage turn off delete resources related to the VM determine total costs. See the topic on costs if you commplete this in less than a day, the costs for these resources will not be immediately visible in the Azure cost analysis tool. Potentially wait until next day to view the costs in the Azure portal. This analysis was very small, so the costs will be very very small. uses the outputs from the costs analysis to add a list of resources and costs to your report. As mentioned above, if you use unique tags when creating the virtual machine it will be easier to identify costs specific to this activity "},{"location":"exercises/exercise_using_the_cloud_to_summarize_and_visualize_data/#due-dates","title":"Due dates","text":"The due date will be discussed in the email but they are flexible.
"},{"location":"exercises/storage_pricing_exercise/","title":"None","text":"Prior to doing this exercise, See the reading and lecture slides for Cloud Storage for definitions of terms.
How large, approximately, is your data? If you are unsure, estimate 100 gb. How much would it cost to keep it in the cloud?
Compare the pricing for Blob, Files and Disk storage for 6 months
Aspects Of Storage:
Redunancy: Always slect \"LRS\" as that is almost always sufficient and for con Storage prices are not the same across regions, but the default (\"East US\") works for this exercise Consider only the \"Hot\" storage of the different tiers (\"Premium\", \"Hot\", \"Cool\", and \"Archive\") for some high performance applications, Premium is required, but look at the price difference! Operations, Transactions and data transfer costs charged per 10K operations really hard to estimate unless you know your workload very low costs, e.g. reading 10K Blobs costs 1/2 of one cent. I would not bother estimating this cost unless you know you will have very high disk operations Types of Storage to Compare:
Azure Blob Pricing: https://azure.microsoft.com/en-us/pricing/details/storage/blobs/ select \"Hierachcial namespace\"
Azure Files Pricing: https://azure.microsoft.com/en-us/pricing/details/storage/files/
Managed Disk Pricing : https://azure.microsoft.com/en-us/pricing/details/managed-disks/
note these are in different sizes and types, select 128gb size if you are estimating 100gb data, Standard SSD when you create a disk in the protal, it defaults to 1 TiB size, which is quite expensive / month "},{"location":"exercises/storage_pricing_exercise/#optional-compare-with-on-premise-storage-costs","title":"Optional: compare with On-premise storage costs","text":"The MSU HPC offers 1TB storage with redundant backups and high-speed access for free, with each additional 1TB for $125/year. Since this is network attached storaage is this comparable to Azure Files or Azure Blob storage?
If you need 2TB storage ( 1 free + 1 paid), what is the approximate Azure cost for 2000gb for 12 months, ignoring all operatinal costs (just storage)?
"},{"location":"references/","title":"Cloud Computing References and Links to Azure Documentation","text":""},{"location":"references/#cloud-computing-for-research","title":"Cloud Computing for Research","text":"\"Cloud Computing for Science and Engineering\", Foster and Gannon
Chapter 1: Orienting in the cloud universe ( Alternative link to publisher preview chapter ) Using Cloud Computing for Academic Research, Mahmoud Parvizi, unpublished draft, 2021.
Several additional resources for learning about cloud from Cloudbank, a west-coast consortium to help researchers use cloud computing: https://cloudbank-project.github.io/cb-resources/
Very in-depth case study of cloud for simulations (climate models): \\ Cloud Computing for Climate Modelling: Evaluation, Challenges and Benefits. Montes, D., et al. Computers 2020, 9(2), 52; https://doi.org/10.3390/computers9020052(2020).
"},{"location":"references/#general-cloud-computing-interest","title":"General Cloud Computing Interest","text":"Historical Note Who Coined 'Cloud Computing'? by Antonio Regalado, October 2011, MIT Technology Review
Intro to Cloud Computing from Microsoft which is primarily for IT people responsible for spending money and maintaining IT Infrastructure: MS Training Describe cloud computing
"},{"location":"references/#azure-resources","title":"Azure Resources","text":""},{"location":"references/#general-azure-references","title":"General Azure References","text":"Main Azure Documentation : https://docs.microsoft.com/en-us/azure/
List of All Azure Services : https://portal.azure.com/#allservices
Azure Tips and Tricks : https://microsoft.github.io/AzureTipsAndTricks/
Azure Portal \"How to\" series - focused on using the Azure portal to do several different things. This is mostly about the services themselves, not the portal, and many topics do not apply to us (e.g. Azure Arc) but there are some very useful videos : https://youtube.com/playlist?list=PLLasX02E8BPBKgXP4oflOL29TtqTzwhxR
These look like really good intros to Azure, but requires a time investment. The examples are not really research computing examples but may be valuable learning examples. Most of these lessons were taken from other 'learning paths' and are still oriented towards IT professionals
Microsoft Learn: - Azure for Researchers part 1: Introduction to Cloud Computing - Azure for Researchers part 2: Cloud Security and Cost Management
"},{"location":"references/#azure-books-available-to-the-msu-community-via-the-library","title":"Azure Books available to the MSU Community via the Library","text":"Search for Microsoft Azure, ordered by date
Microsoft Azure Functions: Developing Serverless Solutions Trevoir Williams, Packt Publishing 2022
Practical Azure SQL Database for Modern Developers Davide Mauri, Silvano Coriani, Anna Hoffman, Sanjay Mishra, Jovan Popovic Apress 2021.
Planning, Deploying, and Managing the Cloud Julian Soh, Marshall Copeland, Anthony Puca, Micheleen Harris. Apress 2020.
"},{"location":"references/#interface-azure-portal","title":"Interface: Azure Portal","text":"Azure Portal Documentation : https://docs.microsoft.com/en-us/azure/azure-portal/
Microsoft Azure Hierarchy: Organize your Azure resources effectively
Re-organize your portal view by creating a new dashboard (optional) : https://docs.microsoft.com/en-us/azure/azure-portal/azure-portal-dashboards
Azure portal productivity Tips : https://microsoft.github.io/AzureTipsAndTricks/blog/tip329.html#azure-portal-productivity-tips
https://microsoft.github.io/AzureTipsAndTricks/blog/tip329.html
"},{"location":"references/#azure-interface-azure-command-line","title":"Azure Interface: Azure Command Line","text":"Command-line progamming of Cloud Services
Azure PowerShell (Windows) https://docs.microsoft.com/en-us/powershell/azure/
Introduction to PowerShell : https://docs.microsoft.com/en-us/powershell/azure/get-started-azureps?view=azps-3.0.0 Azure Command Line Interface (CLI) (MacOS, Linux): https://docs.microsoft.com/en-us/cli/azure
Introduction to Azure CLI https://docs.microsoft.com/en-us/cli/azure/get-started-with-azure-cli?view=azure-cli-latest Hybrid inferface: using the CLI inside the Azure Portal You can install and use the az
CLI program on your own computer, but Azure also has a way you can use the CLI without installing anything, with a cloud-based terminal interface called the \"cloud shell.\" For an overview see https://docs.microsoft.com/en-us/azure/cloud-shell/overview and for a great 'quickstart' see https://docs.microsoft.com/en-us/azure/cloud-shell/quickstart for a quick tutorial for how to use it. In the quickstart, the first example shows you how to create a resource group using the CLI in the cloudshell. If you don't have permissions to create a new resource group, skip to the next example (\"Create a Linux VM\") and put your own resource group in the command for the -g
parameter and perhaps use a very unique name for the VM parameter.
"},{"location":"references/#azure-storage","title":"Azure Storage","text":"Create a Storage Account:
https://docs.microsoft.com/en-us/azure/storage/common/storage-quickstart-create-account
Azure Storage Explorer: https://azure.microsoft.com/en-us/features/storage-explorer/
Blob Storage Documentation: https://docs.microsoft.com/en-us/azure/storage/blobs/
Create and Manage a Storage Account: https://docs.microsoft.com/en-us/azure/storage/common/storage-quickstart-create-account
Using the CLI with Storage Reference: https://docs.microsoft.com/en-us/cli/azure/storage/account
Using PowerShell Storage Reference: https://docs.microsoft.com/en-us/powershell/module/azure.storage
Create blob storage with CLI:
https://docs.microsoft.com/en-us/azure/storage/common/storage-azure-cli
Create blob storage with PowerShell:
https://docs.microsoft.com/en-us/azure/storage/blobs/storage-quickstart-blobs-powershell
"},{"location":"references/#compute","title":"Compute","text":"Overview of Compute Options: https://docs.microsoft.com/en-us/azure/architecture/guide/technology-choices/compute-overview
Choosing an Azure Compute Service (Decision Tree): https://docs.microsoft.com/en-us/azure/architecture/guide/technology-choices/compute-decision-tree
"},{"location":"references/#interface-arm-templates","title":"Interface: ARM templates","text":"Azure Resource Manager Templates are JSON-formatted configuration files that dictate which resources to create.
See also information on 'Bicep', which is Azure's sipmlified (but still complex) template language to replace the ARM templates
Overview of ARM templates: https://docs.microsoft.com/en-us/azure/azure-resource-manager/templates/overview
explore quick start ARM templates (web): https://azure.microsoft.com/en-us/resources/templates/
explore quick start ARM templates (github): https://github.com/Azure/AzureStack-QuickStart-Templates
many of these github repositories include a \"deploy to Azure\" button that will run the template via the portal and create resources. "},{"location":"references/#r-and-azure","title":"R and Azure","text":"https://blog.revolutionanalytics.com/2018/12/azurestor.html
https://cloudblogs.microsoft.com/opensource/2019/07/01/azurer-available-create-manage-monitor-azure-services-r/
https://docs.microsoft.com/en-us/azure/architecture/data-guide/technology-choices/r-developers-guide
https://docs.microsoft.com/en-us/azure/machine-learning/studio-module-reference/r-packages-supported-by-azure-machine-learning
https://github.com/Azure/AzureContainers
https://github.com/Azure/AzureR
https://github.com/Azure/AzureRMR
"},{"location":"references/#python-and-azure","title":"Python and Azure","text":"https://azure.microsoft.com/en-us/develop/python/
https://docs.microsoft.com/en-us/azure/python/
https://github.com/Azure/azure-sdk-for-python
https://github.com/Azure/azure-storage-python
https://azure.github.io/azure-sdk/releases/latest/all/python.html (Note that pypi.org/project/azure/ is deprecated/obsolete if you find that via google)
"},{"location":"references/#matlab-and-azure","title":"MATLAB and Azure","text":"https://blogs.msdn.microsoft.com/uk_faculty_connection/2017/06/29/running-matlab-on-azure-provision-a-matlab-distributed-computing-server-using-azure-vms/
https://github.com/mathworks-ref-arch/matlab-on-azure
https://www.itcentralstation.com/products/comparisons/mathworks-matlab_vs_microsoft-azure-machine-learning-studio
https://www.mathworks.com/solutions/cloud.html
"},{"location":"references/#microsoft-azure-cosmos-db","title":"Microsoft Azure Cosmos DB","text":"CosmosDB is a very large scale data system that can act like other database systems including SQL, MongoDB (a popular no-sql database), and others. It's advantage is that it can handle extremely large data sets (65tB) but is easy to get started. Google and AWS have similar offereings ( \"BigQuery\" and \"Aurora\" respectively).
If your data is not large, consider using SQL data systems which are also very widely used (and can be used on your own computer)
Intro: https://docs.microsoft.com/en-us/azure/cosmos-db/introduction
It can be free to use, but you have to turn that on when creating the service for your account: https://docs.microsoft.com/en-us/azure/cosmos-db/free-tier
You can run a notebook inside the databaase to queery data with python :
Notebook Description: https://docs.microsoft.com/en-us/azure/cosmos-db/cosmosdb-jupyter-notebooks Service announcement: https://azure.microsoft.com/en-us/blog/analyze-and-visualize-your-data-with-azure-cosmos-db-notebooks/ Video: https://www.youtube.com/watch?v=OrnZMkP5Eq4&list=PLLasX02E8BPBKgXP4oflOL29TtqTzwhxR&index=7 "},{"location":"references/#cloud-architecture","title":"Cloud Architecture","text":"This section has resources for advanced to intermediate cloud users who are interested in much more details that most researchers will ever need, and are really geared for IT staff. However, sometimes to find insight into how to approach your problem (especially for cloud timing ooptimazation projects) these may have useful sections.
Microsoft Azure Infrastructure Services for Architects by John Savill, Oct 2019, available from the MSU Library : http://catalog.lib.msu.edu/record=b13538669~S39
Azure has changed since 2019 but may still be relevant
"},{"location":"sessions/01_introduction/","title":"Introducing the MSU Cloud Computing Fellowship","text":"You don't have to face the clouds alone"},{"location":"sessions/01_introduction/#welcome","title":"Welcome!","text":"This is the first 'session' of the MSU Cloud Computing Fellowship (CCF) for 2022-2023. For a description of the program and how sessions are organized, see the CCF home page
The goals of this introductory session are to orient you to this program, introduce ourselves to each other, provide some background on cloud computing, set up our technology, and discuss what all of our expectations are.
"},{"location":"sessions/01_introduction/#activities","title":"Activities:","text":"Introduce yourself on Microsoft Teams
You should have all been given access to a Team \"MSU ICER Cloud Computing Fellowship\" via your NetID.
Please log in to Teams (via the web https://teams.microsoft.com/ or using the Teams client) Post a new message in the \"general\" channel just saying \"hello\" and include your name, department and how you prefer to be addressed. If necessary MSU IT has documentation about MS Teams here: https://tech.msu.edu/technology/collaborative-tools/spartan365/ ( the link on that page requires yet another MSU log-in) Confirm Access to Azure Portal
Go to https://portal.azure.com. Log in with your MSU netid and password. Ensure you can access the Azure main web \"portal.\" You don't need to (and shouldn't) create any new resources or work with this website; simply confirm you have access. You may see a list of \"resources\" and will introduce Azure during our first meeting. "},{"location":"sessions/01_introduction/#introductions","title":"Introductions","text":""},{"location":"sessions/01_introduction/#msu-cloud-computing-fellowship-team","title":"MSU Cloud Computing Fellowship Team","text":" Dr. Brian O'Shea, Director, MSU Institute for Cyber-Enabled Research (ICER), Professor, Physics Dr. Mahmoud Parvizi, Co-Instructor Research Consultant and Software Engineer, Institute for Cyber Enabled Research Participant in first Fellowship cohort Manager of ICER Training Patrick Bills, Co-Instructor Research Software Engineer, Institute for Cyber Enabled Research. Brad Fears, Contributor, MSU IT Services Research Cyber Infrastructure (IT RCI) IT support staff with certification in AWS and Azure Sponsored by ICER, MSU Office of Research and Innovation (ORI), and MSU IT Services Research Cyberinfrastructure (RCI)
"},{"location":"sessions/01_introduction/#participant-introductions-discussion","title":"Participant Introductions & Discussion","text":" About you: your preferred name and pronouns, which degree program or department if faculty. 2 minute research synopsis and methods Previous experience with reseach computing including cloud computing (if any) Current research computing hurdles, roadblocks, challenges & triumphs Which aspect of cloud computing are you most interested in learning and using to support your research? "},{"location":"sessions/01_introduction/#fellowship-program-overview","title":"Fellowship Program Overview","text":""},{"location":"sessions/01_introduction/#fellowship-goals","title":"Fellowship Goals","text":"Help you get an understanding of:
what is cloud computing? what is cloud computing useful for? when should it use it for my research computing? how can I use it? Understanding of the context of the technology we are learning about. Help you get some practical experience
apply cloud to some aspect of your own research apply cloud to generic/canned research-like problem Fellowship - Learn from and support your fellow researchers
Non-Goals: - cover all aspects of cloud - we don't cover networks for example due to time constraints - prepare you for a cloud computing certification (there are many existing resources for that) - become experts in everything cloud - build a dot-com empire
"},{"location":"sessions/01_introduction/#program-overview","title":"Program Overview","text":"The syllabus\" is the home page of this website and has a detailed schedule. Keep an eye on the home page for updates!
Fall semester: Workshops (Pat Bills): Schedule and expectation; website structure, session materials and activities, readings); in-person meeting approx bi-weekly and excluding holidays; our expectations. Winter/Spring semester: Projects (Mahmoud Parvizi): Goals, schedule and expectation; Proposal write-up due early January, and presentations during semseter; Check-points to discuss progress and hurdles On-going help Final presentation during Symposium late april "},{"location":"sessions/01_introduction/#introduction-to-cloud-computing","title":"Introduction to Cloud Computing","text":" The Computing in Cloud Computing Aspects of Cloud Computing Azure Organization Learning how to learn about cloud "},{"location":"sessions/01_introduction/#hands-on-using-the-azure-portal","title":"Hands-on: Using the Azure Portal","text":" Interacting with Azure using the Portal web interface Setting a Budget Alert Using the Azure Portal "},{"location":"sessions/01_introduction/#questions-and-discussion","title":"Questions and Discussion","text":" What things are at the top of your mind as you begin this program? Which of these topics resonates with your previous experience using computing or cloud computing (if any)? "},{"location":"sessions/01_introduction/#follow-up-activity","title":"Follow up Activity","text":"Please complete the following prior to our next meeting in 2 weeks:
Read about Azure Organization see the topics above and the in-depth readings below to give you more context as you learn Complete the exercise Create a Budget Alert so that you may be notified if you spend more money than you plan to. *This first part requires significant learning, and the more you do know the better choices you can make when developing your project. *
"},{"location":"sessions/01_introduction/#bonus-activity","title":"Bonus Activity","text":"If you are familiar with the command line, Azure offers a web-based terminal/shell with many applications pre-installed. once you have a storage account, you can create a special 'cloud shell' account. We will cover various interfaces to the cloud next time.
Overview of the Azure Cloud Shell Start and use the Azure Cloud Shell "},{"location":"sessions/01_introduction/#readings","title":"Readings","text":" Wikipedia article on cloud computing is actually pretty good Chapter 1: Orienting in the cloud universe from \"Cloud Computing for Science and Engineering\", Foster and Gannon ( Alternative link to publisher preview chapter ) Using Cloud Computing for Academic Research, Mahmoud Parvizi (draft version). The NIST definition of cloud computing ## Optional Readings - Optional Historical Note Who Coined 'Cloud Computing'? by Antonio Regalado, October 2011, MIT Technology Review
Optional [M. Armbrust et al. \"Above the Clouds: A Berkeley View of Cloud Computing. Technical Report UCB/EECS-2009-28 \"University of California at Berkeley, Electrical Engineering and Computer Sciences, 2009 PDF](https://www2.eecs.berkeley.edu/Pubs/TechRpts/2009/EECS-2009-28.pdf} Written only 3 years after the launch of AWS, this is very insightful discussion of the value of cloud computing "},{"location":"sessions/02_how_to_cloud/","title":"Session 2: What is the cloud and how does it work? An introduction using storage and virtual machines","text":""},{"location":"sessions/02_how_to_cloud/#about-this-session","title":"About this Session","text":"We are providing materials and activities for this session for you to read and attempt at your own pace. Please attempt these and see how far you can get. Feel free to post on Microsoft Teams if you have any issues, find things that need correcting, or have general questions.
We will host an optional, additional, in-person session to provide help to anyone who wants to attend, Friday September 23 2pm to 3:30p. Since this is outside of pre-arranged schedule, anyone who would like help but can't attend during this please contact us and we will arrange a time for you.
We will discuss all of this material and more during our next regularly scheduled in-person session Friday September 30th.
"},{"location":"sessions/02_how_to_cloud/#overview","title":"Overview","text":" When many people think of \"cloud computing\" they think of computers in the cloud, or virtual machines. Cloud computing companies offer much more than just virtualized hardware, but this is a good place to start. This session is designed to be a hands-on workshop where we walk-through creating the resources needed for to run a computer in the cloud, logging into this computer, copying data and using that data in a program. At the end of the session you should have a good introduction of what it means to \"cloud compute.\"
"},{"location":"sessions/02_how_to_cloud/#overview-presentation","title":"Overview Presentation","text":"Cloud Concepts & Virtualization Slides (PDF)
"},{"location":"sessions/02_how_to_cloud/#about-the-azure-portal","title":"About the Azure Portal","text":"We were introduced to the portal in the first session. The following dives into more detail about 'resource groups' which is the core of how Azure is organized. Note that, as we get started, fellows have access to just a single resource group that we've created for you. You can't create your own but you can create as many resources as yuo need inside this single resource group.
Top-down description of how Azure is organized From Session 1 Using the Azure Portal : tutorial and video "},{"location":"sessions/02_how_to_cloud/#optional-follow-ons","title":"Optional Follow-ons:","text":"Azure Storage
The Activity above had you creat a 'storage account' with no background.
You will see there are different types of storage, but all types must be inside a \"storage account\" and this \"storage account\" must be inside a resource group.
We will re-visit concepts and usage of cloud storage in detail, as it's a core aspect of cloud computing.
"},{"location":"sessions/02_how_to_cloud/#virtual-machines","title":"Virtual Machines","text":"We introduced \"virtualization\" during our introduction. For IT this means flexibly creating multiple resources on one piece of hardware using software. The main use case is many virtual computers (or servers) on one large computer hardware. This was create prior to cloud, but when you create your own computer in the cloud, it's based on the technology. To a user it may seem very similar, but to the systems IT engineer, it's very different. However these readings may help give you an
"},{"location":"sessions/02_how_to_cloud/#readings","title":"Readings:","text":" Chapter 4: Computing as a Service from \"Cloud Computing for Science and Engineering\", Ian Foster and Dennis B. Gannon, September 2017 What is a Virtual Machine (VM)? Introduction from Microsoft What is a Virtual Server? Youtube Video from IBM describing how companies (including MSU!) use virtualization to run multiple computers on one server to optimize the use of space in a data center. What's the difference between cloud and virtualization? from RedHat, a Linux Operating system company "},{"location":"sessions/02_how_to_cloud/#activity-create-a-virtual-machine-with-azure","title":"Activity: create a virtual machine with Azure","text":"Create (and delete) a Virtual Machine with the Azure Portal for both windows and Linux.
"},{"location":"sessions/02_how_to_cloud/#discussion-why-create-a-vm","title":"Discussion: Why create a VM?","text":"What is a VM good for? The activity above does not discuss why you'd create a VM and connect with remote desktop, only that you can do it. We will discuss that at our next session. Can you think of possible use cases for your research, or other types of research, for a remote computer that could be very powerful or very small?
"},{"location":"sessions/03_cloud_storage/","title":"Session 3: Cloud Storage","text":""},{"location":"sessions/03_cloud_storage/#introduction","title":"Introduction","text":"Central to using cloud for nearly all services is storing data. Cloud storage is quite different from what most are used to related to saving a file to your disk or USB removable media or even our HPC. During the previous workshop we created a VM but didn't use cloud storage, we simply create a VM \"virtual disk\" that is attached to the VM just like your hard drive is attached to your own computer. However there are disadvantages to this : 1. the main OS disk is typically deleted when the VM is deleted, although you can create a 'durable' disk to share 1. the data on the main OS disk is tied to that Virtual Machine and hence that operating system, that is, it's typically inaccessible from other cloud services 1. it is limited in size and scope The largest of virtual disks are around 1 TB. Azure Cloud storage accounts are limited to 5 TB and you may have multiple storage accounts. 1. You can only move data to/from a virtual or shared disk storage using a virtual machine 1. Most importantly virtual disks very expensive compared to cloud storage
Cloud storage was engineered to save millions of files for millions of users and will take some changes to your approach to understanding how it works.
"},{"location":"sessions/03_cloud_storage/#activities","title":"Activities","text":" Download and install the Azure Cloud Storage Explorer See the \"Download now\" button at the top of that page. You may review the content of the page
We did this in the first session, but if you want to work through this again for complete exercises in Creating Azure Cloud Storage Accounts to create and use storage.
Exercise: Azure Storage Pricing
"},{"location":"sessions/03_cloud_storage/#readings","title":"Readings","text":" Azure Cloud Storage for Researchers (Slides)
Not a bad, high-level introduction : Edureka Azure Storage Tutorial (there are several pop-ups and ads, but it's a good level of of information )
Storage as a Service from \"Cloud Computing for Science and Engineering\" Azure Documentation: Introduction to the core Azure Storage services Table of Azure Storage Product Offerings Optional: this is long (It says 46 minutes but it will probably take less time) but a good basic introduction to Azure storage: Azure Training: Explore Azure Storage services ( free training from Microsoft Learn) optional Understanding block blobs, append blobs, and page blobs
Introduction to Azure managed disks This has more technical background than necessary but could be very helpful.
"},{"location":"sessions/03_cloud_storage/#post-session-discussion-points","title":"Post-session discussion points","text":"There are several options when creating a storage account. For example, what is the difference LRS vs GRS? Is the documentation describing these clear or confusing? What conditions might you consider LRS vs GRS? Is it worth the cost?
How would you share data with colleagues outside of MSU using cloud storage? Where did you find the information for how to do that (Microsoft, Azure, Blog post, other)? Let's say need to share 5gb of data. After doing the pricing exercise above just for storage, what are the costs for each upload and download of 5gb? Does it make a difference if it's Blob or File storage?
"},{"location":"sessions/03_cloud_storage/#optional-activities","title":"Optional Activities:","text":"The following two activities walk through attaching Azure files to a VM so you can use it just like any other disk. This is only one method for moving data to/from cloud storage to your VM, but it does not require changing your program code.
For Windows Users: Using File Storage with Windows VM
Microsoft Tutorial: Create an SMB Azure file share and connect it to a Windows VM using the Azure portal
Notes: - The tutorial has you create a storage account, but you can re-use the one you've already created (and change the names), or follow the tutorial and create another one. - Not all versions of Windows can use this. For much more detail, see the Azure documentation page \"Mount SMB Azure file share on Windows\"
For Linux Users: Mounting File Storage with Linux VMs using NFS
Microsoft Tutorial: Create an NFS Azure file share and mount it on a Linux VM using the Azure portal
How to mount Azure Files on Linux using SMB
Notes: - SMB (invented by Microsoft for Windows) and NFS (invented by Sun Microsystems from Unix) are competing methods for attaching network storage. Both were created for on-premise servers, but Azure Files storage brings this to the cloud. - this tutorial uses command line, and requires an ssh connection to the VM you create. - Knowledge of Linux systems (mount points, fstab, etc) required
Optional: Python And Blob Storage
This describes an a different method for moving files to/from cloud storage: using code. This does not require you to 'mount' the storage to your VM.
For Intermediate Python users, and if you have time and interest, consider this tutorial from Azure: Quickstart: Manage blobs with Python v12 SDK
Requirements:
knowledge of Python use the blob storage account you created in the exercise above or createa a new one familiarity with Azure portal Python installed on your computer (suggest python 3.6 minimal) familiarity with the terminal and command line **Optional: Using Managed Disks with Linux
Azure Learning Tutorial : Add and size disks in Azure virtual machines
Notes: - Uses the Azure Command line interface which we have not discussed. For
"},{"location":"topics/","title":"Short Topics for the Cloud Computing Fellowship","text":"These topics are introduced in the sessions in the syllabus. This is an index of all the topics here to help you find them outside of lessons. They are not in any particular order, but aggregated here in an effort to help you find them.
The Computing in Cloud Computing Aspects/Nature of Cloud Computing How to Cloud with Azure (pdf, slide format) Azure Organization Learning how to learn about cloud Using Tags in Azure to organize, identify and find resources Azure Costs Basics Azure Cloud Storage for Researchers (slides format) Introduction to cloud interfaces (web, REST API, Python, command line, Javascript, etc) "},{"location":"topics/azure_cloud_cost_basics/","title":"Intro to Cloud Costs on Azure","text":"You've heard us say that nearly everything Azure has a cost, but how can you tell how much?
First, Cautionary Tale: Google Cloud Charged Me $1000 For This Mistake by Kunal Vaidya on Medium. *tl;dr: he forgot to turn off a service even though he was no longer using it. Good news it, Google does grant 1-time forgiveness if you can prove you are using the service to learn about it (e.g. you are student). *
"},{"location":"topics/azure_cloud_cost_basics/#video-walk-through-of-azure-cost-analysis","title":"Video Walk-through of Azure Cost Analysis","text":"The following video walks through how to use the costs analysis features of the Azure portal for your resource group. 1) It helps to understand Azure Organization, and 2) it is from a few years ago so the screens may look a little different
Short video (3:30) Demonstrating Azure Portal Cost Analysis, on MSU MediaSpace (log-in required)
"},{"location":"topics/azure_cloud_cost_basics/#details-about-costs-in-azure","title":"Details about Costs in Azure","text":"The content below assumes you have knowledge of how to use the Azure Portal, basic cloud operations, what a virtual machine is. See the links and materials for session 01 for the necessary background.
"},{"location":"topics/azure_cloud_cost_basics/#1-pricing-pages","title":"1. Pricing Pages.","text":"All cloud vendors have pricing pages that describe how they meter and charge for services. For Azure this is https://azure.microsoft.com/en-us/pricing/#product-pricing
However I usually find the page I need quickly by simply googling azure <service name> pricing
for example I wanted to see how much a static IP address costs in azure so googling 'azure static ip pricing' takes me to https://azure.microsoft.com/en-us/pricing/details/ip-addresses/
Some of these pages are straightforward, but like the one above has addition knowledge. What does this mean in practice? For example, what does \"classic\" vs \"ARM\" even means? There is a link at the top of the page but this may take time to read and understand. I'll tell you that we will never use 'classic' and only use 'resource manager (ARM).' so look at the ARM Prices.
This kind of background info is very common for services.
"},{"location":"topics/azure_cloud_cost_basics/#2-build-something-and-check-the-cost","title":"2. Build something and check the cost","text":"The other option is the empircal method: build something, use it, review the costs, and estimate.
At the resource group in the protal ( see Azure Organization), there is a link on the left-side menu, near the bottom labelled \"Cost Analysis\" - click that
This is a live report of your current costs, with the ability to filter by time period, resource type, tags, and other things.
Near the middle are rouded buttons controlling the view you see. At the right side of this is a button \"Add Filter\" which you can click to show costs only for some resources. For example if you click that and select \"Service Name\" and then \"virtual machines\" you will see the costs for the current month.
A powerful filtering technique is to use tagging in Azure, which is akin to adding meta-data to resources. See the Cloud Glossary
In many of the filtering mechanisms in Azure (including costs), the tag names (keys) use use are listed in the options for filtering.
Carefully select the date range for which you want an estimate, especially if your trial run started a few days ago in the previous month as the default is a monthly estimate. Use a custom date range for the time period that makes sense for the costs you want to observe.
Example Azure Cost Analysis Screen, filtered by Tag. Click for larger view
"},{"location":"topics/azure_cloud_cost_basics/#3-pricing-calculators","title":"3. Pricing Calculators","text":"All the cloud companies have pricing calculators and they may be good for very rough estimates but I always multiple by 1.2 as I'm sure I missed some crucial resource that I didn't know I needed or didn't know costs money.
For Azure it's https://azure.microsoft.com/en-us/pricing/calculator/
"},{"location":"topics/azure_cloud_cost_basics/#summary-and-other-notes","title":"Summary and other notes","text":"Combining these three methods is how we can estimate costs.
Notes:
Pricing often depends on the location or region you select. Most regions in the US are the same price.
Data transfer costs are really hard to estimate. Transfer into the cloud (Ingress) is often free but out of the cloud (egress) usually has a charge. This is because companies with web products *(e.g. websites, web stores, image sites, etc) make money when customers view their pages (more customers => more costs =>but more revenue). However note that MSU has a deal with Azure and data transfer from Azure to MSU is (mostly) free. One way to mitigate data transfer costs in Research is to transfer large data inputs into azure, but only take out the smaller output (results, summaries).
"},{"location":"topics/azure_cloud_cost_basics/#azure-pricing-resources","title":"Azure Pricing Resources","text":"Quickstart: Explore and analyze costs with cost analysis
Video from John Saville on cost estimation including the pricing calculator: Master the Azure Pricing Calculator Jun 17, 2021
"},{"location":"topics/azure_organization/","title":"Azure Organization","text":"This is a brief description of how Azure cloud services are organized for those just getting started with Azure. It's my own take on this topic written with researchers in mind. However it should not replace Azure official documentation. The link below has a great summary of how it's setup. However you may ignore all the other sections in the \"Azure setup guide\" as this is geared for IT professionals adoption cloud for their own organization
Microsoft Azure Documentation: Organize your Azure resources effectively
Azure is organized by directories of user accounts and subscriptions. All resources must be created in exactly one \"subscription\" which is a method for billing and for setting permissions. Your organizations \"directory\" is where your user account lives, but you may have access to multiple subscreiptions with one user. MSU created a \"Cloud Computing Fellowship\" subscription for all activities and resources for this, and we added your MSU directory accounts this subscription.
Cloud computing components are known as \"resources,\" which AWS defines as \"an entity you can work with.\" Anything you can create using a cloud interfaces is a \"resource.\"
To help with more organization, in Azure, resources belong to a resource group. Resource groups can collect resources by project which could still have hundreds or just a few resources. There is no restriction and up to you to organize how it works for you. For example, a lab could have a resource group for each member, or perhaps a resource group for each project, and members collaborate on those projects.
It's also possible to restrict access to resource groups, e.g. a resource group for a project may only allow those who are working on the project access to that resource group. Azure has other organizational tools such management groups across subscriptions, complex identity management and role-based access control (RBAC) that we won't cover here.
However, this is mostly for organization and resources may be accessed from one resource group to another, and even across subscriptions. Applying this organization scheme requires practice and sometimes vigilance.
For most campuses, researchers will want to have their IT department create the subscriptions and billing as they often can get discounted prices or fee waivers. When your research group is ready to pay for services here at MSU, see the link to the \"cloud services request form\" on https://tech.msu.edu/network/cloud-services/
Summary of top-down Azure Organization:
Directory : (MSU account). All account must come from a directory (but an account can be multiple directories) Management groups : we won't use these, for admins to manage multiple subscriptions) Subscription : tied to a billing account, and where all resources are created. Resource Group : organizational tool for resources. Think of it as a \"folder\" in your file system Resource : any cloud entity you may work with (e.g. create, configure, destroy) Finally, it is possible to log-in to the Azure portal (e.g. your MSU account) and not have a valid subscription and not be able to create or access any resources. If you have never used Azure before, you may be asked to create a free trial. If are a you need to use Azure (e.g. for training) and do not have access to an MSU subscription, you may want to use a non-MSU email address and create your own account.
Azure \"tags\" add added to resources (including resource groups) and are a way to identify and locate resources by search as for many other services. They are optional but highly recommended to use a tagging scheme to help organize your resources and for cost analysis. You can use any keys and any values you find useful.
"},{"location":"topics/azure_organization/#azure-locations-or-regions","title":"Azure Locations or Regions","text":"Subscriptions are for accounting only and don't represent concrete cloud resources. However cloud resource must reside in computer somewhere, and hence have a location. Locations for cloud providers for can be thought of inside one of their massive data centers. In Azure, \"region\" and \"location\" are used interchangably (some interfaces use 'location, some use 'region')
Resources and Resource groups must be assigned a location when you create them. considerations are 1) does the location actually provide the services you need (not all locations have all cutting edge products) and 2) is the location close to you to reduce time it takes for data to cross the internet to/from you and finally 3) is there some restriction based on your country of origin.
Most of the time, simply choose the default which is East US which almost always has the latest features. For some advantage for data transfer, choose (US North Central US). However as a rule select a location/region and use that across all of your resources so that, for example, your data files in storage are close to (in the same data center as) a computer you may create.
Regions become very important for companies that offer services around the world and want to reduce the connection time for their customers. It's also possible to have back-ups of resources in different region to protect against natural disasters.
"},{"location":"topics/azure_tags/","title":"Azure tags","text":""},{"location":"topics/azure_tags/#using-tags-to-organize-resources-in-azure","title":"Using Tags to organize resources in Azure","text":"Tags are notes to yourself about the resource, use them for metadata.
As the number of cloud resources blossom (e.g. cloud sprawl) it can be important to find related resources quickly. The azure portal has a way to see resource within and across resource groups using different filtering methods. One of those is the with resource meta-data, and you can add meta data using 'tags.'
In my group we always have a tag with the key \"created by\" and value the netid of the creator. This may be redundant here becuase all the resources you create will be a in resource group with your NetID already in it, but add this for practice.
You may consider using a tag like \"project\" with value for the project if either 1) a project may have multiple resource goups or 2) a resource group would have multiple projects.
For now you have only one resource group, but tags are also used to find things across different resource groups, e.g. if by project name.
Tags can be added and removed at will from resources without altering the resource, so add as many tags as you want when starting to see how they may work.
"},{"location":"topics/azure_tags/#example-usage","title":"Example usage:","text":"When creating resources using the wizard, many resources are created at once. For example creating a virtual machine may create 12 resources. Adding a tagl to ID those resources together can really help to delet them.
use the Portal to create a test virtual machine (VM), which creates 12 resources add a unique tag to those during the VM creation process, e.g. tag \"id\" = \"test VM Oct 1\" when you later need to delete the VM becuase you are done with it, or it wasn't what you needed, you can filter resources in your group on this this so you can select those 12 resources, and not any others, without having to hunt for them by name. "},{"location":"topics/intro_aspects_of_cloud_computing/","title":"Nature of Cloud Computing","text":""},{"location":"topics/intro_aspects_of_cloud_computing/#some-motivation-at-amazoncom","title":"Some Motivation at Amazon.com","text":" Massive IT infrastructure supports the Amazon store and company They wanted to sell shopping application as a service to a company like Target who didn't want to r-un their own store. T This required the software developers to have lots of flexible infrastructure (servers) to run on. They found team to build a service (with software) could spend 70% of their time setting up the 'back end' They called all the infrastructure needed to run a massive dot-com \"muck\" and saw this as a secondary supporting role to application development. What they wanted in days actually took months. "},{"location":"topics/intro_aspects_of_cloud_computing/#eureka-moment-for-amazon-we-could-sell-it","title":"Eureka moment for Amazon: we could sell it","text":" Amazon automated their IT department so teams could order and provision the servers they needed on demand beyond just virtualization (\"everything was an API\") They got really good at running very large data centers for many customers as cheaply as possible and on-demand for Amazon.com and other stores and services. They realized that their innovations would help any IT organization and especially internet start-ups like themselves, and that they could sell it. Their customers were other IT departments Blog Post from 2006: \"We Build Muck, So You Don\u2019t Have To\" "},{"location":"topics/intro_aspects_of_cloud_computing/#nist-defintion-of-cloud","title":"NIST defintion of cloud","text":"Government offices interested in purchasing cloud computing needed a definition of it to differentiate from other kinds of computing, hence... the NIST definition of cloud computing essential characteristics
On-demand self-service. Measured service: pay for what you get. Broad network access: accessible from the internet Rapid elasticity: no limits from a customer perspective. This word was invented by AWS Resource pooling: single resources serve many customers. "},{"location":"topics/intro_aspects_of_cloud_computing/#what-is-cloud-computing-cloud-concepts-vs-cloud-providers","title":"What is Cloud Computing? Cloud concepts vs Cloud Providers","text":" Three major cloud providers are in a constant arms race, literally (Azure vs. Amazon competed for a $10B defense contract): Azure, Amazon Web Services and Google Cloud Platform
Offerings are very similar so all are great choices
other options, smaller companies, open source options (used by Indiana University JetSteam HPC, Osiris project from MSU, UMich, Wayne State and IU. Cyverse for running jobs. "},{"location":"topics/intro_aspects_of_cloud_computing/#benefits-of-cloud-computing-for-research","title":"Benefits of Cloud Computing for Research","text":" Customized Computing: can create customized resources only when you need it Elastic/On-demand: can run ad-hoc computations on those on-demand resources Instant service: Reproducible: a computation can be re-run as needed, meaning cloud resources can be easily re-recreated to re-run your computations. Cost effective: unlike commerical applications, more users does not mean more revenue. Budgets are fixed and the pay-as-you-go model requires vigilance to not over-spend. Others? Restatement of goals of this Cloud Computing Fellowship:
Learn which types of computing resources are beneficial to your research Learn how to use Cloud to create those resources Use the services packaged by cloud companies to discover new resources "},{"location":"topics/intro_aspects_of_cloud_computing/#using-workflow-and-computational-thinking","title":"Using workflow and computational thinking","text":" Karl Popper stated that \"non-reproducible single occurrences are of no significance to science\" ( K Popper, \"The Logic of Scientific Discovery\", English translation from Routledge, London, 1992, p. 66.) and this is a significant issue for research based on computing. To enhance reproducibility in your own work, consider documenting all the steps needed for create the environment to run your computation. For many on-premise academic systems (e.g. the MSU HPCC), we depend upon the system administrators to create that environment, but we may install and configure all the software we need to run our code. Workflow thinking can apply to the scienfic domain itself (e.g. \"Principles for data analysis workflows\" https://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1008770 ) and to the provisioning of the cloud computing environment. That is, we may use a workflow system for creating all the cloud stuff we need, and then a different workflow system that runs on that cloud stuff. One example is we may create an HPC system on Azure using templates and then launch the Slurm scheduler on that HPC to run our jobs. (note the complexity of running your own HPC is beyond the scope of this fellowship and used as an example only)
A major advantage to using workflows or code for provisioning your cloud computing components is that you can turn them off and delete them when you are done, and restart when needed.
Our first uses of cloud will use forms to create resources, but we encourage you to automation where possible.
"},{"location":"topics/intro_aspects_of_cloud_computing/#about-cloud-security","title":"About Cloud Security","text":"Security and Risk management are important issues even for researchers who's data are open - If your computer is a server, your responsibility just increased 100X: these are prime targets. Consider each component of a server to be a point of vulnerability. - Finding a readable list of security recommendations for cloud computing is a challenge for all the reasons outlined above. Our textbook has a nice chaper outlining cloud security - We will cover methods to reduce security risks but it's important to consider the risk of hacking from the beginning
Attackers may use the services you create to launch attacks on other services, leaving you liable.
The \"Shared responsibility\" model for cloud computing takes a model of computing components, and shows how much of each component the user is responsible for security. Microsoft Model of Shared Responsibility for Cloud Computing
We will come back to this model as we gain deeper understanding of research computing on the cloud.
"},{"location":"topics/intro_aspects_of_cloud_computing/#hpcc-vs-cloud","title":"HPCC vs Cloud","text":" Dr. Parvizi's white paper outlines the challenges of adapting HPC workflows to cloud computing. The HPC is amazing effective at running all kinds of systems at very list cost, if any, to MSU researchers, but not all are the best fit.
Many systems not designed for HPC can be adjusted to run in that environment. However, just like many workflows are difficult to port from HPC to cloud, some cloud workflows are difficult run on HPC (but never say never). Especially windows-based software.
"},{"location":"topics/intro_aspects_of_cloud_computing/#acknowledging-bias-in-access-to-cloud-computing-across-research-cultures","title":"Acknowledging bias in access to cloud computing across research cultures","text":"It's widely recognized that AI is frequently bias. For example, Azure Voice recognition did not work for a female researcher who developed voice-controlled surgery, so
However I believe there is also inherent bias in the user interfaces, design and definitions in the engineering of technology across many axes of diversity (gender, culture, background, training, creativity, etc). System Engineering is it's own discipline and Cloud computing is arcane so our goal is to reduce conceptual barriers to using this technology while you work with us.
"},{"location":"topics/intro_aspects_of_cloud_computing/#about-cloud-costs","title":"About Cloud Costs","text":" Cost management is a major hurdle for adopting CC, so we will talk about costs extensively (Almost) everything you do in Azure has a cost Costs often acrue over time, wether the resource is in use or not Deleting resources when are not using is a great way to reduce cost We want to encourage you to experiment! Using a very powerful machine for an hour may cost only $0.50 Just be aware that creating something and leaving it on will deplete your budget Solution: \"Budget Alerts\" Case Study: Computation of a machine learning model based on gene networks for inferring gene association ( https://www.geneplexus.net): a single (virtual) machine to run the ML such that users would not have to wait too long would be $650/month. However, if the computational power is provisioned only when needed, it's 5 cents/job.
"},{"location":"topics/intro_aspects_of_cloud_computing/#value-proposition-of-cloud-computing","title":"Value Proposition of Cloud Computing","text":" Costs are more than just dollars for services. Consider [Total Cost] = ( $ + Time + Risk )
[Total Time] = ( development time + wait time + compute time )
Security Risks are rarely non-significant, so factor that into cost In the Service level spectrum, the higher level \"platform\" services may have higher monetary costs but often reduce time and risk "},{"location":"topics/intro_to_cloud_interfaces/","title":"Interfacing with Cloud Services","text":"Cloud Services are by design DIY or on-demand and hence need a programming interface to create cloud resources. This is only possible becuase inside the data center, computer configuration can be done completely with code, also knows as \"Infrastructure as Code\" (IaC). Amazon's insight was that they could slap a website on top of that, put a system for tracking (metering) usage, and sell it.
All of the cloud companies as their base use a web interface, so-called REST API. Knowing the details of REST is not important but it's often the basis for all of the other style of interfaces.
Here is an example web api URL for weather forecast, with parameters for coordinates, units and format of output
https://www.7timer.info/bin/astro.php?lon=113.2&lat=23.1&ac=0&unit=metric&output=json&tzshift=0
Very few researchers would ever use the REST api directly, instead would use the web interface or even better the command line or programming language interface which achieves the same goal with less work.
In Azure, everything you could possibly create is called a \"resource:\" a machine, a data service, a single network address. The system to work with Azure resources is the \"Azure Resource Manager\" or ARM and the primary interface for the Resource Manager is their web (REST) api. You may see references to resources in documentation and that means any web doo-dad.
"},{"location":"topics/intro_to_cloud_interfaces/#summary-of-cloud-interfaces","title":"Summary of Cloud Interfaces","text":"This summary is focused on Microsoft Azure, but the other cloud companies have similar concepts. In addition to this guide, Chapter 1 of our text \"Cloud Computing for Science and Engineering\" has an excellent description and examples of these interfaces with examples from AWS. See the section of that chapter titled \"Accessing a cloud service\" in https://s3.us-east-2.amazonaws.com/a-book/Orienting.html
"},{"location":"topics/intro_to_cloud_interfaces/#graphical-web-interface","title":"Graphical Web Interface","text":"Most people want a graphical user interface, and for azure that's the \"Portal\" or https://portal.azure.com. For Google cloud it's the \"console\" and for AWS it's also called the console. See below for an introduction to using the portal. Note that the Azure portal and Google console both have web-based terminals that allow you to use the CLI directly in the web interface.
"},{"location":"topics/intro_to_cloud_interfaces/#desktop-applications","title":"Desktop Applications","text":"Azure provides some desktop applications for working with a few of the widely used cloud services :
Azure Storage Explorer: https://azure.microsoft.com/en-us/features/storage-explorer/ Can create cloud storage and upload/download data. We will use that for our session on Storage Azure Data Studio: https://docs.microsoft.com/en-us/sql/azure-data-studio/what-is-azure-data-studio?view=sql-server-ver15 Can connect to and work with data systems (such as databases ) that are on your computer, on a system on campus, or hosted in Azure "},{"location":"topics/intro_to_cloud_interfaces/#command-line","title":"Command Line","text":"For those not familar with the command line at all, see https://www.digitalocean.com/community/tutorials/an-introduction-to-the-linux-terminal for linux and for Windows Powershell see https://programminghistorian.org/en/lessons/intro-to-powershell
The command line interface is a great way to interact with cloud services because it's imperative and all options are specified in a single command. With the web interface, you may have to hunt through the user interface to find the checkbox for an option, but for command line
Azure has two command line interfaces: The \"CLI\" which is based on Linux and will work in any linux or Mac terminal (or shell script) and the \"Powershell\" interface which is for Windows Powershell users. Since Powershell has been ported to Linux and Mac and the Linux Shell and Azure CLI can also be used on Windows, so both are operating system independent but in practice, Windows users use powershell and everyone else uses the CLI. Your choice depends on the kinds of other systems you'll be working with. For example, the MSU HPC uses Linux command shell but Windows servers and other Windows services like SQLServer work well with Powershell.
"},{"location":"topics/intro_to_cloud_interfaces/#sdk-software-developer-kit","title":"SDK : Software Developer Kit","text":"A \"software developer kit\" is simply a collection of utilities, libraries/packages and documentation for a specific language to work with a specific service. All the cloud vendors have SDKs, and they all have SDKs for Python. SDK simply means you can create, delete, interact with cloud services from your program.
Why leave python or R if don't have to?
"},{"location":"topics/intro_to_cloud_interfaces/#python-sdk","title":"Python SDK","text":"All cloud vendors have SDKs to work with Python. After installing the SDK, you import the libraries and issue commands to create resources, then use those cloud resources to do work via client libraries (either Azure libraries or others). Azure has extensive documentation for using Python: https://docs.microsoft.com/en-us/azure/developer/python/?view=azure-python
Example Azure code to create cloud storage, compared with how you would see the resources in the azure portal, and similar commands using the CLI : https://docs.microsoft.com/en-us/azure/developer/python/azure-sdk-example-storage?tabs=cmd
Note that Azure also has a service \"Azure Cloud Functions\" that run python that are not the same thing as the SDK. These are 'serverless' resources (similar to AWS Lambda), which we will learn about later in the course.
Both AWS and Google Cloud have Python SDKs, and probably other vendors.
"},{"location":"topics/intro_to_cloud_interfaces/#rest","title":"REST","text":"Knowing the details of REST is not important but it's the basis for all of the other style of interfaces.
Here is an example web api URL for weather forecast, with parameters for coordinates, units and format of output
https://www.7timer.info/bin/astro.php?lon=113.2&lat=23.1&ac=0&unit=metric&output=json&tzshift=0
The parameters to the weather data fetch program are lon, lat, ac, unit, output=json, tzshift, and they are embedded in the URL itself.
This is caled a \"request,\" and using a web API often requires sending parameters not just sin the URL, but as an attachment or in the 'body' of the request. Browsers don't have an automatic way of doing that, so we use scripts (python Requests library) or special programs for testing Web APIs that can send parameters and data in the request body.
This is a good explanation of REST and part 2 describes the details.
https://medium.com/extend/what-is-rest-a-simple-explanation-for-beginners-part-1-introduction-b4a072f8740f
The Azure REST api is a an interface to the Azure Resource Manager via the web. Requests sent can get information about your resources, or create new resources, just like the portal, the command line and the SDKs. Those other interfaces typically translate to the REST API. Knowing about it may help diagnose why your method for interfacing with Azure is not working but not necesary to learn. For examples and more detail, see https://learn.microsoft.com/en-us/azure/azure-resource-manager/templates/deploy-rest
Few of us would ever use the Azure REST api directly, instead would use the web interface or even better the command line or programming language interface which achieves the same goal with less work.
"},{"location":"topics/intro_to_cloud_interfaces/#r","title":"R","text":"Unlike the other vendors, Microsoft maintains an SDK for R Users which allows you to create cloud services directly from Rstudio. See their github pages https://github.com/Azure/AzureR and excellent documentation throughout the packages.
"},{"location":"topics/intro_to_cloud_interfaces/#cloud-company-templating-frameworks","title":"Cloud company templating frameworks","text":"In addition to the \"SDKs\" for existing languages, cloud companies often have their own frameworks for using code to build (provision) infrastructure. For Azure, these are \"ARM Templates\" and for AWS it's Cloud Formation.
"},{"location":"topics/intro_to_cloud_interfaces/#azure-arm-templates","title":"Azure: ARM templates","text":"Azure has a system for submitting a template, or essentially a configuration file to the Azure Resource Manager (ARM) that dictates which cloud resources are to be created. For Azure these are JSON-formatted files that are \"declaritive\" (rather than procedural or imperative like Python). The best way to understand these is to explore the many that Microsoft posts on github, and to try them. If you do, be mindful to delete any resources you create so as not to be charged for them.
- Overview of ARM templates: https://docs.microsoft.com/en-us/azure/azure-resource-manager/templates/overview\n- Quick start ARM templates (github): https://github.com/Azure/AzureStack-QuickStart-Templates\n
You may see reference to \"Bicep\" templates. This is simplified ARM templating language that may be easier to write, debug and maintain than the JSON format of ARM templates.
"},{"location":"topics/intro_to_cloud_interfaces/#aws-cloud-formation","title":"AWS: Cloud Formation","text":"AWS also has templating language similar to Azure Resource Manager templates called cloud formation. If you are using AWS for your project, and want to automate the creation and deployment of resources, this may be a good option.
AWS Documentation:
What is AWS CloudFormation? How does AWS CloudFormation work? "},{"location":"topics/intro_to_cloud_interfaces/#third-party-programming-with-terraform","title":"Third-party programming with Terraform","text":"There are other ways to 'program the cloud' from companies outside of the big three. One widely used frame is \"Terraform\" from Hashicorp, not affiliated with any cloud company. The advantage to Terraform is that it's declarative in that you specific what you want, unlike say the Python or command line interface, where you have to create items with commands one at a time.
Terraform is used by cloud professionals becuase it's designed to keep the resources youve created running and allow you to modify them in place. If you find you are using scripting to build resources (which is great!) but your scripts are becomming combersome to maintain and your cloud architecture is complex, consider using Terraform.
- Terraform: https://www.terraform.io\n- Can work with any vendor including Azure\n- Often more readable than ARM templates, Syntax remarkably simple \n- Focus on maintaining consistent systems ( declarative) \n- Does not cover all services, but can fall back to ARM templates when necessary\n
"},{"location":"topics/intro_to_cloud_interfaces/#building-cloud-from-cloud","title":"Building Cloud from Cloud","text":"This may not be an 'interface' but is operationally similar. It's possible to use some of the above interfaces on existing cloud services, e.g. creating new cloud resources automaticaly from existing cloud resources. Your cloud architecture may need different types of resources, or parameterized resources only as needed (e.g. depending data inputs, a web-gateway for cloud on demand).
For example Azure Logic Apps can create resources when they are run (e.g. provision and start a computer) and a logic app can be triggered by events such as when a new file is created, or using a web api (e.g. REST POST command that sends data and parameters). This adds significant complexity and is only valuable for event-based systems opens up using the cloud as a big computer programming language.
"},{"location":"topics/intro_to_cloud_interfaces/#references","title":"References","text":"See our references page for curated Azure links. For AWS, see
https://aws.amazon.com/tools/ about the AWS CLI: https://aws.amazon.com/cli/ Demo Using Python Notebook with AWS: https://s3.us-east-2.amazonaws.com/a-book/s3.html "},{"location":"topics/learning_how_to_learn_about_cloud/","title":"Learning how to learn about cloud","text":""},{"location":"topics/learning_how_to_learn_about_cloud/#guidelines-for-researchers","title":"Guidelines for Researchers","text":"You may have looked at the various websites and poked around the web, and found it's just not clear at all how cloud computing may be helpful to you, even though it all sounds great. The challenge for researchers learning about cloud is that most cloud documentation for isn't written for you.
Challenges for researchers learning:
Cloud training and documentation are mostly written for IT professionals like system admins and architects, software developers, business people, and agency managers. Researchers tend to be a little of all of those things.
Requires an understanding the concepts, glossary of IT Infrastructure as cloud services are based on a model of IT so traning materials often have an embedded conceptual models of computing.
Goals of researchers are often different from IT Professionals responsible for buildomg systems used by hundreds of people or for business purposes. That can make it difficult to decipher which kind of cloud service will work best for your use case. As Dr. Parvizi writes (link to pdf), cloud is very different from using traditional research-oriented technology like workstations or HPC. There are hundreds of services to choose from but we find many researchers will reach for the conceptually straightfoward path of creating cloud computers and install what they need. Our goal for this fellowship is to provide context and background, and help you explore some of the so-called \"cloud native\" technologies like \"serverless\" systems that let you run your scripts without dealing with operating system installs. "},{"location":"topics/learning_how_to_learn_about_cloud/#what-documentation-is-available-for-researchers","title":"What documentation is available for researchers?","text":"There are general, conceptual introductions and dicussions for academics.
https://cloud4scieng.org/ Book and website from Ian Foster (U. Chicago) and Dennis Gannon (IU) , the text used for this fellowship. https://cloudmaven.github.io/documentation/ from the eScience institute, University of Washington. Unmaintained. source code https://cloudbank-project.github.io/cb-resources/ succesor to the cloudmaven? Cloudbank training videos "},{"location":"topics/learning_how_to_learn_about_cloud/#learning-how-to-learn-about-cloud-caveats-and-help","title":"Learning how to learn about cloud: Caveats and help","text":"As part of this fellowship, our goal is to help you translate documentation written for the systems and developer perspectives into a research perspective.
"},{"location":"topics/the_computing_in_cloud_computing/","title":"Helping to Understand the \"computing\" in cloud computing","text":"You come to us with a unique set of experiences with computing, with more or less experience depending on your previous needs. A challenge we have seen, for the many years we've been helping people, is understanding the context of computing in their research to understand the tools they have available.
In fact most documentation for cloud computing assumes you know the world of computing. An introduction to cloud computing from microsoft lists this Prerequisite: \"Basic familiarity with IT terms and concepts.\" It turns out 'basic' can mean a lot of things.
A core goal of the MSU Cloud Computing Fellowship is to help you connect cloud computing to your research in a meaningful way
our original question: - How can cloud computing benefit help your research?
Let's re-frame the question for this discussion: - Which kind of computing could help my research? - Can I use that kind of computing in the cloud? \\ That is, could cloud computing enable me to use computing I otherwise couldn't?
You may already have an idea of what this is, and experience with computing but many who come to us know it's valuable but are ready to learn why.
"},{"location":"topics/the_computing_in_cloud_computing/#what-is-computing-minimal-vocabulary","title":"What is computing? Minimal Vocabulary","text":"Cloud computing was invented for, and is marketed to IT systems administrators, software developers, and IT/technology managers. See the history of AWS. It is was not designed with researchers in mind. Most training and documentation Note, however, that Cloud Computing is general enough and is often marketed to researchers or 'for research.'
The primary function of cloud computing is to provide \"infrastructure\" aka the \"back-end\" or back room of a company's IT department, so we ware going to learn about that. In fact, cloud computing is frequently defined, named, and sold based on abstractions of physical components of computers and IT infrastructure. Hence learning more about IT infrastructure, or \"computing\" may be helpful understanding the context in which cloud computing is engineered. This can help you determin what you may need from cloud computing to get your research done.
Could you purchase your own infrastructure (computers, networks, disks, etc) and run it \"on-premise\" and get the same benefit as cloud computing? Or have your institution do that? Sometimes yes! The MSU HPCC is a great example when on-premise is more beneficial and cost-effective than cloud computing.
"},{"location":"topics/the_computing_in_cloud_computing/#about-computing-major-components-of-computer","title":"About Computing: Major components of computer","text":"Of course you know what is in a computer. The goal is to come to common understanding, and to frame for extension to cloud, and to find the cloud services that mimic these features.
User software (scripts, user code, etc) Base Software (programs to run scripts such as Python, Rstudio, Stata, Fluent, etc etc and/or libraries to compile code such as the gcc compiler, etc) Operating System (needed to make the computer functional) Input/Ouput (I/O, infrastructure to get data in and out of a computer, primarily network connections but also USB) Storage - external ( attached or via network or other I/O ) Storage - local disk Central Processor (CPU) & Memory (RAM) Computer Architecture (model type) Network Where is the data in this abstraction of computer infrastructure? Answer: everywhere
If you hadn't thought or known about the components of a computer, that's no mistake. Most people don't know the details of how their car operates, how to change their oil, or diff between carburetor and turbo charging?
"},{"location":"topics/the_computing_in_cloud_computing/#stack-model-of-computing","title":"\"Stack Model\" of computing.","text":"Just as in Science and the humanities, we need a model and terminology to talk about a subject. A standard IT model of a computer is a 'stack' model, where each upper layer depends upon the layers below. Most models of cloud computers build upon this simple model.
User Interface/Connection Software Operating System Computer Hardware: CPU & RAM Data Storage Network"},{"location":"topics/the_computing_in_cloud_computing/#about-computing-what-is-a-server","title":"About Computing: What is a server?","text":"Cloud computing started with, and frequently talks about \"servers,\" so we should define that.
A server is any computer running software that listens for, and responsed to, messages. For a server to be useful it should be connected to a network but it doesnt' need to be. Some terms: - The 'server' is actually the software, not the hardware. You can run a server on your laptop. - The computer that runs the software is the 'host' - A 'client' is software sends the message, and receives and interprets the response. - the protocol is the method by which you exchange messages. Now it is almost exclusively web (http) but there are many others - the form the input message can take, and the form of the message that is returned is known as the API of the server. it's the interface that you have to work with. - port: a computer may run many servers for internal and external use. Unix devised a system of numbered 'ports' (nbumber 01 to 64K), and when running a server you must tell the server which port to listen for messages. Users of most software never have to know or think about ports.
The 'Client/server' model invented in the 60s is so successful that we use servers for our daily lives and don't think about it (except when the server is down). This model of computing is important because it's at the basis for of cloud computing.
We often think of a server as a box, but in the model above, the server is in the software layer, but each of layers below provides services for that software to exchange messages with another computer. If you can abstract, virtualize or automate the layers below, it becomes much easier to provision servers than to purchase, install and configure physical hardware.
"},{"location":"topics/the_computing_in_cloud_computing/#example-server","title":"Example Server","text":"A Web server is a well known, easy to use, and very useful server to run. The terms above translate as follows:
server is any machinge (including your laptop) running a program that listens for http messages on port 80. client is a web browser message:URL which includes address, url paths, and additional parameters response: several headers including the status of the request (that we rarely see) and ultimatley the contents of the web page client interprets the code and renders the page. an alternate client could be a script, or the curl
utility https://www.amazon.com/dp/B09VXBNTJ1/ref=sr_1_93?brr=1
What is the host in this URL? What is the message? We could spend a week talking about web servers, protocols and a year about programming web server. The important thing is that there is a host computer, the 'web server' software on the host listening for requests, and the client(s) connecting to it to retrieve files.
"},{"location":"topics/the_computing_in_cloud_computing/#other-types-of-servers","title":"Other Types of Servers","text":" Database Client: special database client (not web browser) sends data commands as messages, response is tabular outputs File Servers Share files. We use Cloud file sync services, but Collaboration Email, calendaring etc Enterprise Data Systems for loading, cataloging, transforming business data Security Firewalls, Proxy, network traffic management Monitoring system health data collection, accessible via another web server Web-based services For example D2L. Many of these do not use web-based protocols or connections. They define their own protocols, either as a public standard (e.g. email) or proprietary standard (database)
"},{"location":"topics/the_computing_in_cloud_computing/#servers-and-networks","title":"Servers and Networks","text":"Networking Requirements to access a server:
the server must be on the same network as you to receive your message I can run a web server right on my laptop, but you couldn't reach it. the network is me talking to myself the more accessible the network, the more vulnerable, so partitioning is used servers that accept messages from the Internet are a major security risk network failure stops all work for everyone designing efficient, robust, and secure networks is a major resource drain Why do I think this is important? not only can you make a server (web, data, cluster, etc with cloud but everything you interact with in cloud is a server. You will see many services dedicated to networking in the cloud.
On our campus, the network is managed by the institution, and it is configured to block all incomming traffic to prevent anyone from running a server which is a security risk.
"},{"location":"topics/the_computing_in_cloud_computing/#too-much-hardware-virtualization-to-the-rescue","title":"Too much hardware? Virtualization to the rescue","text":"If you run a big IT Department that services 1000s of people, you need a lot of servers. Each server can only handle a certain amount of 'traffic.' Hence there are many methods for connecting multiple servers to act as one big server. Each physical machine requires 1) installation 10) maintainence.
IT Departments 'serve' large user communities with large amounts of infrastructure. Techniques were invented to separate the 'server' or 'network' from the hardware. Virtualization: single box with a layer of software to share among different software. Many servers could be created and managed with software on a single hardware Virtualization was a necessary conceptual and technological innovation to pave the way for cloud computing and is widely used both on-premise and in the cloud. Networks and other services followed suit: create single big computers that uses 'virtualization' software to emulate the functioning of a service, such that the clients don't know they are not working with an abstraction. Running different wires to connect different things is labor intensive. "}]}
\ No newline at end of file
+{"config":{"lang":["en"],"separator":"[\\s\\-]+","pipeline":["stopWordFilter"]},"docs":[{"location":"","title":"Syllabus","text":" from John Constable, Cloud Studies, 19th Century English painter
"},{"location":"#msu-cloud-computing-fellowship","title":"MSU Cloud Computing Fellowship","text":""},{"location":"#program-summary","title":"Program Summary","text":"The program runs Fall semester through Winder/Spring semester.
Fall semester is dedicated to learning how the cloud works. In-person sessions are approximately bi-weekly (see schedule below). A session includes preparatory readings and activities to orient you to the topic, followed by an in-person meeting on Friday to review the materials, seminar, provide a venue for discussion, and hands-on activities
Winter/Spring semester is for building a project using cloud computing culminating in a symposium where you present what you've learned and built. Winter/spring sessions are bi-weekly for presentations by the fellows on their project status, discussion on success and challenges, presentations by cloud practitioners, and for general help.
The culmination of the fellowship is a project resulting in a write-up and presentation during the spring symposium, typically held late-April or early-May.
"},{"location":"#textbook","title":"Textbook","text":"We will occasionally link to the following book:
\"Cloud Computing for Science and Engineering\", Ian Foster and Dennis B. Gannon, September 2017
MIT Press website Book Website : Cloud4SciEng.org The book website does provide open access to individual chapters.
"},{"location":"#meeting-location","title":"Meeting location","text":"MSU STEM Teaching and Learning Facility 642 Red Cedar Rd Michigan State University. East Lansing, MI 48824
Fall 2023: room 3201 STEM room 3201 is on the North West corner on the 3rd floor, across from the North elevator Winter 2024: room 1201 STEM We plan to hold all sessions in-person.
"},{"location":"#fall-2023-schedule","title":"Fall 2023 Schedule","text":"Each approximately 2-week session consists of - preparatory activities and materials (topics, links, tutorials) prior to meeting - an in-person session for review, activities, and discussion on the friday of week 1 - follow up for week 2
Introduction Requirements: September 4-8 Complete items in the Welcome email sent by Dr. Parvizi post to teams to say hello Meeting September 8, 3pm STEM 3201: introduction to the cloud fellowship intro to Cloud/Computing and Azure Assignments:
How to Cloud
Meeting September 22, 3pm
Azure Organization Creating and Using Virtual Machines Cloud Storage
Meeting October 6, 3pm Assignment/Exercise: analyzing weather data in the cloud
Databases and Data Analytics Systems on the Cloud for research
Meeting October 20, 3pm Exercises: Using SQL database for research data Big Data Systems and the cloud
Meeting November 3, 3pm: Exercise: Using R and Python on a databricks cluster Serverless Cloud Computing
Meeting November 17, 3pm Review Project Requirements and Specification Assignment: email 1-2 sentences describing a project you may undertake by December 6 Azure AI Services
Meeting December 1, 3pm: Discussion: Fellowship Projects Demonstration of AI Services Exercise Hands-on reating responder using Python API Assignment: project proposal due January 8, 2024 "},{"location":"#winterspring-2023-schedule","title":"Winter/Spring 2023 Schedule","text":"The second half is dedicated for fellows to complete a cloud computing project based on research interests culminating in a presentation at a symposium in late April
Fellows will attend bi-weekly meetings where groups of fellows will present the goals and stats of their cloud computing projects for feedback and discussion.
All meetings are in the MSU STEM Building, room 1201, alternate Fridays 3pm to 4:30pm
Instructors are available by appointments, and typically during the alternate fridays to answer any questions you have about cloud, projects, or applying cloud computing technology to your research
Turn in Project Proposal Monday, January 8th Post Written Project Proposal to MS Teams folder prior to 5:00 pm Additionally survey of fellows to determine symposium dates will be distributed
Schedule meeting with Instructors to review proposals. This is on-going during January to ensure we have time to meet with all fellows one-on-one.
Cloud Computing Seminar TBD January 12th
Project Proposal Presentations Fellows will present their proposals to the fellowship, up to 6 per session, followed by questions and feedback from colleagues
January 26, February 9th, February 23
Project status presentations Fellows will present the status of their projects, describing challenges and successes, and receive questions, feedback, support and help from the fellowship ** March 8, March 22, April 5**
Project Final Reports April 12 A writeup of the the results and lessons from applying cloud computing technology
Symposium Preparation
The data and time of the symposium will be determined January 24 Fellows must turn in Symposium Talk Title & Abstract 3d prior to symposium "},{"location":"#msu-cloud-computing-fellowship-symposium","title":"MSU Cloud Computing Fellowship Symposium","text":"Fellows will present the outcomes, successes, challenges and lessons learned at a symposium held on MSU campus late April, 2024. The date and time determined in January 2024 with input from the fellows. Fellows are strongly encourage to invite their advisors, mentors and colleagues.
"},{"location":"#communications","title":"Communications","text":"Fellows are encouraged to contact us with questions or if they are ever stuck on an activity we've assigned. In addition to email, we are utilizing Microsoft Teams at MSU (Fellows receive a link in the welcome email). Please feel free to reach on out the MS Teams channel sent to participants at the beginning of the program. Mentioning one of us e.g. @billspat or @parvizm will help get our attention. Additionally you may email us at any time.
The goal of the fellowship is to foster discussion. We encourage you to add your successes or challenges to any discussion or question Teams.
If you need interactive, on-going help it may be better to schedule a help session with a fellowship coordinator; and we are happy to meet individually for additional support. This may be especially effective when fellows are developing their projects.
We also save time during our synchronous meetings for group discussions, so please bring any concerns, difficulties, or successes to our sessions!
If you are not a participant but have questions about the program, see the Contact page for how to get in touch with us.
This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License
"},{"location":"about/","title":"About The MSU Cloud Computing Fellowship","text":"The MSU Cloud Computing Fellowship is a cross-disciplinary program produced by MSU\u2019s Institute of Cyber-Enabled Research (ICER) and MSU IT Services for invited MSU doctoral students and postdoctoral researchers. As a part of this program, fellows will participate in a series of workshops during the fall semester to:
Determine the aspects of your research that can be accomplished with cloud computing; Incorporate cloud-based systems into your research application or workflow; and Understand the strengths and limitations of commercial cloud computing with the goal of improving research yield and minimizing cost, and to develop a workflow that utilizes that knowledge. "},{"location":"about/#background","title":"Background","text":"MSU doctoral students and postdoctoral researchers are invited to apply in the summer and approximately 18 are selected each year. The program started in 2019. If you are an MSU graduate student or post-doc and interested in participating next year, please check back in the Summer of 2022 for announcements on the invitation to participate, or request to join the MSU ICER mailing list
"},{"location":"about/#citing-the-msu-icer-cloud-computing-fellowship-in-research-publications","title":"Citing the MSU ICER Cloud Computing Fellowship in Research Publications","text":"We encourage cloud fellows to acknowledge the fellowship in publications arising from computational work performed during your fellowship project. Please let us know that you have referenced the fellowship, and we will link to your publication on the ICER publication site, which will further increase the visibility of your work. A sample statement can be:
\"This work was supported in part through Michigan State University\u2019s Institute for Cyber-Enabled Research Cloud Computing Fellowship, with computational resources and services provided by Information Technology Services and the Office of Research and Innovation at Michigan State University.\u201d
"},{"location":"about/#cloud-computing-fellowship-organizers","title":"Cloud Computing Fellowship Organizers","text":"Dr. Brian O'Shea Professor and Director, MSU ICER
Role: Program Lead, ICER
Dr. Brian O'Shea is a computational and theoretical astrophysicist studying cosmological structure formation, including galaxy formation and the behavior of the hot, diffuse plasma in the intergalactic medium and within galaxy clusters. He is also a co-author of the Enzo AMR code, an expert in high performance computing, and an advocate for open-source computing and open-source science. He received his B.S. in Engineering Physics at the University of Illinois in Urbana-Champaign (UIUC) in 2000, and his PhD in physics from UIUC in 2005 (with 2002-2005 being spent as a graduate student in residence at the Laboratory for Computational Astrophysics at UC San Diego and in the Theoretical Astrophysics Group at Los Alamos National Laboratory). Following that, he was a Director's Postdoctoral Fellow at Los Alamos National Laboratory, with a joint appointment between the Theoretical Astrophysics Group and the Applied Physics Division. Since 2008, he has been a member of the faculty at Michigan State University, with a joint appointment between the Department of Computational Mathematics, Science and Engineering (2015-present), the Department of Physics and Astronomy (2008-present), and the National Superconducting Cyclotron Laboratory (2014-present). From 2008-2015, Dr. O'Shea was a member of Lyman Briggs College. He has authored or co-authored over 75 peer-reviewed journal articles in astrophysics, computer science, and education research journals, and has received a variety of awards for his teaching and public outreach efforts. In 2016, he became a Fellow of the American Physical Society, and in 2019 he became the director of MSU's Institute for Cyber-Enabled Research.
Patrick Bills Research Software Engineer, ICER Role: Co-Instructor
Pat Bills research background is in data systems for ecology (MS Entomology, MSU). He has experience in database design, R, Python, and web application programming. Pat has worked in research IT for over 25 yrs for departments and labs across MSU, including for MSU ICER as a high performance computing research consultant and trainer, for MSU Enterprise services as the technical lead of the data science team, and currently as a research software engineer again for ICER.
Like many, he has built and worked with on-campus linux systems for many years including the MSU HPC. Pat started his cloud journey in 2017 during a workshop at the HPC conference where he saw Ian Foster (our textbook author) present his vision of research on the cloud. Since then he has used cloud services from Google, Amazon, and Azure
Dr. Mahmoud Parvizi Research Consultant, MSU ICER Role: Co-Instructor
Mahmoud earned his PhD in physics from Vanderbilt University with research in high-energy theory in the context of early universe cosmology as well as computational astrophysics. In addition, Mahmoud earned an MBA with a concentration in finance from the University of Michigan - Flint. Mahmoud was formerly a postdoctoral research associate in the Department of Physics and Astronomy at Michigan State University with a focus on machine learning applications of cloud-computing workflows and currently a research consultant for the MSU Institute for Cyber-Enabled Research (ICER). He participated as a cloud fellow in 2019 and co-instructor of the Cloud Computing Fellowship in 2020.
Mahmoud\u2019s diverse research interests include mathematical and theoretical physics, data-intensive astrophysics, machine learning for precision health, and cloud-computing platforms for academic research. His expertise includes 1) quantum field theory in curved/non-stationary spacetimes; 2) finite temperature quantum field theory and open quantum systems; 3) automated and end-to-end intelligent data pipelines for signal processing using compressed sensing and applied harmonic analysis; 4) machine learning and cloud-computing applications for precision health.
Sponsored by ICER, the MSU Office of Research and Innovation (ORI), and MSU IT Services Research Cyberinfrastructure (RCI)
"},{"location":"about/#previous-cloud-fellows","title":"Previous Cloud Fellows","text":"2019-2020
MSU Cloud Computing Fellows Summary of the first cohort of MSU Cloud Computing Fellows 2020-2021
Introducing the 2020 Cloud Fellows 20-21 Cloud Computing Fellowship Culminates in Impressive Symposium 2021-2022
Introducing the 2021 MSU Cloud Computing Fellows
2022-2023
Introducing the 2022 MSU Cloud Computing Fellows
4th Annual Cloud Computing Fellows Symposium
"},{"location":"cloud_glossary/","title":"Glossary of Cloud Terms","text":""},{"location":"cloud_glossary/#why","title":"Why?","text":"Researchers using the cloud must know a little about a lot of information technology to get computational work done in their domain specialty. Most cloud glossaries are for systems administrators, not the rest of us. This glossary is much more brief than Wikipedia and hopefully also provides the context a researcher needs to find what you need to use cloud services in your work. Do you have an item to add? Please contact us!
"},{"location":"cloud_glossary/#other-glossaries","title":"Other Glossaries","text":"https://www.cloudbank.org/cloud-terms
"},{"location":"cloud_glossary/#the-glossary","title":"The Glossary","text":""},{"location":"cloud_glossary/#arm-cpu","title":"Arm CPU","text":"CPU from \"Advanced RISC Machines, ltd. While historically most computers used Intel CPUs, ARM provides an alternative CPU that is becoming more popular and present as an option in HPC and Cloud Virtual Machine options. The vast majority of software written for Intel computers is compatible with ARM. Some computational work is sensitive to CPU choice, and CPU choice can affect cost and speed of excecution, so it may be important to understand the implications of this choice of CPU.
"},{"location":"cloud_glossary/#arm-template","title":"ARM Template","text":"A specification file listing all of the cloud resources and configuration settings tha that the Azure Resource Manager can use to create resources for you when you submit it a certain way. Templates are a great shortcut and automation feature but difficult to edit. For details see Azure Documentation: What are ARM templates?
"},{"location":"cloud_glossary/#azure-resource-manager-arm","title":"Azure Resource Manager (ARM)","text":"see Resource Manager
=#### Blob Storage Azure calls there object cloud storage \"Blobs\". It is similar to Amazon Web Service 'S3' and Google cloud storage buckets. Azure Documentation: Introduction to Azure Blob storage While it's possible to 'mount' blob storage to linux VMs using 'blob fuse' or similar packages, it can not work as you may expect and so in practice Azure Files are a better solution for that. See File Storage
"},{"location":"cloud_glossary/#client-server","title":"Client-Server","text":"Client/Server model of computing is something we use everyday but perhaps dont' use this term. See https://techterms.com/definition/client-server_model You are used to using maybe a dozen clients everyday (phone apps, web browser, ssh to connect to a remote linux, Remote Desktop client to connect to remote desktop server, etc). Cloud computing provides all the infrastructure needed to create servers quickly and easily.
"},{"location":"cloud_glossary/#cloud-shell","title":"Cloud Shell","text":"Cloud computing providers usually have a service where you can run command line (CLI) or terminal commands in a web browser 'shell' This is helpful as the libraries and utilities are pre-installed. See https://docs.microsoft.com/en-us/azure/cloud-shell/overview and Azure Interfaces introduction.
"},{"location":"cloud_glossary/#containers","title":"Containers","text":"Or Docker Containers (not all containers need to be Docker the vast majority of container system use Docker). For R users, see https://colinfay.me/docker-r-reproducibility/ For Python users, there is https://www.netguru.com/blog/python-docker-tutorial although you could read either.
Linux Containers is a term for a collection of methods and technologies that allows a multiple isolated systems to be run on one Linux computer. This is differnet from virtual machines in that a VM host provides abstract or virtualized hardware so each VM requires it's own portion of memory and CPU cores whereas containers share the main part of Linux (the kernel), memory and CPU more dynamically. The primary comercial company for containers is \"Docker\" so Docker is sometimes used synonymously with 'container' but it is just one form.
In addition to being more efficient than VMs, most container systems have a system and scripting language for building containers. The means onecan provision an entire system from code. Containers are widely use to package and distribute complex research software systems for example Bioinformatics workflow system \"Cromwell.\" This way reseearches can download and use a pre-installed system without the trouble of getting all of the pre-requistes (dependencies) installed on their machine.
"},{"location":"cloud_glossary/#cpu","title":"CPU","text":"Central Processing Unit, the main 'chip' of a computer, and a core component when specifying a Virtual Machine 'size'
"},{"location":"cloud_glossary/#devops","title":"DevOps","text":"This has many definitions but for researchers the shortcut is using code to make IT infrastructure. Helping developers (like you) do Ops (like sysadmins) with code. see IaC.
"},{"location":"cloud_glossary/#docker","title":"Docker","text":"Docker is the most prevalent form of \"Containers\", e.g Docker is to containers as google is to search. See containers above for details. Note that Docker is many things as once: a method and format for Linux containers, a program for working with container ( e.g. docker build...
), a Company, and that's company's hub or repository for storing and access free containers (or your own). Cloud companies also have \"hubs\" or repositories for storing your own Docker containers.
"},{"location":"cloud_glossary/#file-storage-azure","title":"File Storage (Azure)","text":"Also called \"Azure Files.\" Azure cloud storage that is more traditional file sharing, and that can be connected (mounted) to computers and other services using the SMB protocal, making it similar experience to departmental shared fileservers. See https://azure.microsoft.com/en-us/services/storage/files/ and compare with Blob Storage
"},{"location":"cloud_glossary/#firewall","title":"Firewall","text":"A common concept in networking, firewall software on a computer's networking components limits which kind of traffic can come in or out, and restricts which computer internet addresses can connect. Best practices suggest closing all connections via the firewall, only opening those connections for services you need, and only to those users (e.g. your own computer) you need to. Azure additionally has an option to \"allow connections from Azure networks\" so that you can freely connect from the portal, 'cloud shell', or connect from on azure service to another. The implication is that you trust all Azure services.
"},{"location":"cloud_glossary/#gpu","title":"GPU","text":"From Wikipedia: https://en.wikipedia.org/wiki/Graphics_processing_unit GPUs can be very helpful for some code written to use them, especially many machine learning libraries, and Virtual Machines may be provisioned with GPUs.
"},{"location":"cloud_glossary/#infrastructure-as-code-iac","title":"Infrastructure as Code (IaC)","text":"In stead of using a GUI, or manual steps to create cloud computing, cloud resources may be created using scripts that interact with the cloud provider's api, and additional scripts can configure individual resources (such as to install software on a VM or configure a database). Doing this kind of \"provisioning\" with scripts makes it reproducible and debuggable which is at the heart of the Workflow or DevOps mentality.
"},{"location":"cloud_glossary/#ip-address","title":"IP Address","text":"a unique string of characters that identifies each computer using the Internet Protocol to communicate over a network. Your computer will have a different IP address depending on where you are located (home, work, field). In addition, a home wifi router will assign a 'local' ip address for inside your home, but your 'public' internet IP address will be different. To find your own IP address, simply google \"what is my ip.\" All Azure services (VMs, data systems, etc) are assigned IP addresses via networking. see https://docs.microsoft.com/en-us/azure/virtual-network/public-ip-addresses
"},{"location":"cloud_glossary/#object-storage","title":"Object Storage","text":"From NetApp \"What is object storage?: \"...also known as object-based storage, is a strategy that manages and manipulates data storage as distinct units, called objects. These objects are kept in a single storehouse and are not ingrained in files inside other folders. Instead, object storage combines the pieces of data that make up a file, adds all its relevant metadata to that file, and attaches a custom identifier.\" Blob storage is object storage. Objects (e.g. files) are retrieved from a large system via their identifier, not their name. Amazon S3 and Google Cloud storage are also object stores.
"},{"location":"cloud_glossary/#on-prem","title":"On-prem","text":"\"On Premise\" refers to technology (computers, disks, networking, etc) that are on your institutions computer centers or in your own lab. Note that for some researchers, \"on-prem\" can still mean remove (e.g. our HPC is only accessible remotely, so it may not be obvious that it's on premise to users).
"},{"location":"cloud_glossary/#resource","title":"Resource","text":"For AWS and Azure, a resource is an entity that you can work with. The means something you can created, edit or delete via their cloud interface. Could be a computer (virtual machine), a whole cluster (azure batch pool), or some tiny network setting (IP address). Resoures almost always cost money. Resources are listed in your standard dashboard.
"},{"location":"cloud_glossary/#resource-group","title":"Resource Group","text":"Organizational scheme unique to Azure. Nearly all resources must be part of a group and the resource group must be selected (or created ) when creating other resources. Resource groups could be used for specific projects, for 'personal' resources used for multiple projects (or for azure things like cloud shell).
"},{"location":"cloud_glossary/#resource-manager","title":"Resource Manager","text":"Azure calls the system they use to interface between you and cloud resources the \"Azure Resource Manager\" or ARM. There used to be a different way to interact with Azure resources, hence this has a specific name and is referred to in Microsoft documentation.
"},{"location":"cloud_glossary/#serverless","title":"Serverless","text":"This buzz-word applies to many different cloud services, primarily those that the cloud company manages for you, usually referring to cloud functions (AWS Lamba) and sometimes others in the \"Platform As A Service\" service model. The origin is that, if you run virtual machines with operating systems and software install, your are maintaining servers to support that software. If the cloud service does not require you to provision and maintain a server, it is often marketed as \"serverless\" (e.g. recent marketing of Azure Files as \"Serverless file shares\" where on-premise File Sharing requires staff to manage and maintain Windows File Servers.
"},{"location":"cloud_glossary/#service-models","title":"Service Models","text":"This is related to the \"... as a service\" (..aaS) phrases defined in the NIST document which included \"Infrastructure\", \"Platform\" and \"Softare\" as a service (IaaS, PaaS and SaaS). It's a conceptual organization of cloud services based on the stack model of computating with the infrastructure (network, hardware, CPU, etc) at the bottom and Software on the top. See The NIST Definition of Cloud Computing
"},{"location":"cloud_glossary/#service-level-agreement-sla","title":"Service Level Agreement (SLA)","text":"Level of service you expect from a vendor, laying out the metrics by which service is measured, as well as remedies or penalties should agreed-on service levels not be achieved. In Cloud this is often spells out 'uptime,' which is percent of time the system is not down, e.g. 99.99%, and guarantees against data loss and availability. For most research, uptime is not important as we are our own customer and can tolerate some downtime.
"},{"location":"cloud_glossary/#services","title":"Services","text":"Cloud \"services\" are often bundles of resources pulled together for coordinate function. Cloud companies offer hundreds of often closely overlapping services.
"},{"location":"cloud_glossary/#tags","title":"Tags","text":"AWS and Azure allow you add meta data to resource in the form of tags (e.g. hashtags, etc) which are keys and values. When you create a resource you can add a tag indicating the project it is for e.g. \"project\" = \"dna-methylation\" To add more detail if your DNA methylation has multiple aspects or experiments, add more tags like \"experiment\" = \"Fall 2021\"
For workgroups it's stronlgy suggested you add a \"created_by\" = your netid because it's often difficult in Azure to determine who created a resource if it needs to be turned off or deleted.
Use tags to organize your Azure resources and management hierarchy
"},{"location":"cloud_glossary/#tensor-processing-unit-tpu","title":"Tensor Processing Unit (TPU)","text":"Google Tensor Processing Unit is specialized computer chip similar to GPUs, used by deep learning libraries such as TensorFlow ( which leads to the question of \"what is a tensor\" and that depends on who you ask but similar to matrix.
"},{"location":"cloud_glossary/#virtual-machine","title":"Virtual Machine","text":"(aka VM) Creating a simulated computer hardware using software, to be able run a guest operating system inside a host system, such that the guest thinks it's running on an actual computer.
"},{"location":"contact/","title":"Contacting Us","text":"If you are a Cloud Computing Fellowship participant this year (or past participant!), please contact the instructors Pat Bills or Mahmoud Parvizi with any issues or questions related to the material or activities.
The session meetings are designed to have plenty of time for questions, troubleshooting and discussion. We will also schedule office hours prior to meeting times to help with pre-meeting activities.
If you have general questions about the MSU Cloud Computing Fellowship, please contact Brian O'Shea
If you will be an MSU graduate student or post-doc in the next Fall, and are interested in participating, please check back in the Summer for announcements for invitation to participate. The request for applications is announced on the MSU ICER mailing list and several other mailing lists around campus. We encourage anyone with an active research program that could benefit from cloud computing to apply
If you are an MSU Researcher interested in using cloud for your research, please contact IT Services or MSU ICER via our ticketing systems and describe your needs.
"},{"location":"projects/","title":"Projects","text":""},{"location":"projects/#cloud-computing-fellowship-projects-2022-2023","title":"Cloud Computing Fellowship Projects 2022-2023","text":"The primary activity of the Cloud Computing Fellowship is to support the fellows to create and present a cloud-computing-based project working with research data. During Fall semseter the fellowship provides materials and help to learn core cloud concepts and activies, and Winter/Spring semester is devoted to project development.
"},{"location":"projects/#time-line-and-due-dates","title":"Time-line and Due dates","text":"Fellows deliver a proposal for their projects in early January 2024, and present that proposal to their colleagues in the fellowship. See the schedule for due dates. In Winter 2024 more detail will be provided on this site.
"},{"location":"projects/#questions-answers-and-other-notes","title":"Questions, Answers and other Notes","text":"Q. Do I have to use my own data for my project or can I use data from the web or other public data?
A. you can bring any data that you may use for your research, or that demonstrates cloud processes you may use in your research.
Q. Could I work on a problem outside of my research for my project
A. Yes. We encourage fellows to consider some small aspect of their own research to apply to their projects, but not all research can be readily adapted for cloud computing to contribute, especially with very limited time and budget. If the project is related, even tangentially, to your current research project, and you feel your chosen project will advance your career or knowledge of cloud for later application, then by all means please pursue and present what you've learned.
Q. Do I have to use programming in my project?
A. Most of the examples provided in the fellowship talk about processing data with scripts such as R or Python and many researchers are using these for data analysis, but it's not required for a successful project. You could install a program on a powerful virtual machine and show how to use that software along with cloud storage to tackle a large data set (for example). Secondly there are many forms of cloud computing that are not traditional such as data systems which may use a GUI or a language like SQL.
One important aspect of a successful project is \"workflow thinking\" or how could you design your process so that you could do it 100 times or with some form of automation. That often requires programming but there are cloud systems that don't require programming (e.g. Azure Data Factory). Accumulating and organizing data is a huge part of successful research and using cloud tools to facilitate that and documenting the process, advantages and costs would be a successful project.
Q. Do I have to use Virtual Machine as part of my project?
No you don't, and in fact we encourage you to look for other services in the cloud to work with your data or your research processes.
Q. Do I have to use services that we've covered in the sessions?
A. Cloud companies provide many amazing services, and you are not limited to what we've talked about in the sessions. In addition we don't require you to use \"computation\" based services alone . If you are interested in using some other service, please contact us and we may find useful resources or connect you with a colleague who has used the service in mind.
Q. Are there constraints on the things I want do with my project? Can I do whatever I want?
A. Our goal is to facilitate your education and advancing your research program as it relates to cloud computing, and that is a very broad goal. If you use the fellowship to develop only a small system to show what's possible or not possible, even on public data, that uses cloud computing, that is an acceptable project.
Q. I want to make a web site or application for my project, can I use a VM? how do I do that?
A. This is a common request and the cloud was invented in part to run web applications. However web application design is a huge subject and the programming involved is almost as complex as any programming or data work you've done for your research. We tend to discourage projects focused on web applications because of the work involved to both 1) create the infrastructure for a website (web server, storage, databases, possibly docker containers, etc) and 2) the web application itself (Python/PHP other language, HTML, Javascript, Style Sheets, etc).
Azure has services for hosting websites but don't attempt this for your project unless you have previous experience making websites or web applications, or if you are up for the big challenge of learning webdev along with cloud computing because the research you are showing off is mostly complete. Secondly web services must be on-line 24/7 and the cost may accumulate quickly.
Finally cybsecurity is a major issues for websites which present an open door to anyone on the Internet. keeping your site secure is a major challenge so during development please turn it off when you are not using it, and consider that web applications are hacked routinely.
However if you are ready to devote the time and this is a goal for your and your advisor please come speak with us as we have experience creating research web applications and we will support you.
"},{"location":"exercises/azure_portal_walkthrough/","title":"Exercise: Azure Portal Walk-through and Storage account creation","text":"from MSU Cloud Computing Fellowship Session 1
"},{"location":"exercises/azure_portal_walkthrough/#about","title":"About","text":"This is an exercise and introduction to the web interface to manage Microsoft Azure cloud services. Prior to doing this exercise, please read Azure Organization For more background on how azure is structured.
For definition of terms used in this walkthrough , refer to our Cloud Glossary including \"resource\", \"azure resource manager\" and \"resource group\" or our list of cloud references for introduction to cloud computing.
For this activity we'll be using the web interface which Azure calls the \"Portal\" but that is only one of several ways to interface with Azure that we will learn about. Many of the activities you can accomplish in the portal you can accomplish with the other (command line or code) interfaces.
Azure's own overview of the Portal is here: https://docs.microsoft.com/en-us/azure/azure-portal/azure-portal-overview Please refer to that as well as this material.
There is a corresponding video that we've made that includes infrmation about the portal, and also creating a storage account.
"},{"location":"exercises/azure_portal_walkthrough/#orientation-to-the-azure-portal","title":"Orientation to the Azure Portal","text":"The link above is to a video that walks through the description and tutorial steps below, hosted on MSU MediaSpace ( requires MSU Log-in). Note this video also walks through creating a storage account.
This assumes you have an Azure account and a valid subscription. For the purposes of this introduction, we assume that your account currently does not have ability to create a new subscription, resource group,
Log-in to https://portal.azure.com with your MSU Netid. If you are a current member of the fellowship and you have difficulty logging in, please contact us right away. orientation: dashboard view. Azure portal first presents a \"dashboard\" which is organized into panels that show some aspect of your cloud account. You may alter the panels on this dashboard to show you the services and aspects of azure that are most important to you. For information on how to create customize your dashboard, see \"Create a dashboard in the Azure portal.\" In the standard, default version of the dashboard the first panel is a list of resources. If you have not created any resources yet you won't see anything. We will explorer resources later in this introduction. The standard dashboard panes are a list of your current resources (which may be in multiple resource groups), an advertisement with a link to learn about some new Azure service, and more links to create things the Azure has decided are most important to you. We will focus on the \"All Resources Pane\" If you click on anything here you can almost always use the back button to get back to the dashboard, or use the menu (described below) Top Bar Menu: the top menu ( three horizontal bars) is are links to many of the things also on the main dashboard. The \"home\" view is not the same as the dashboard but is a list of links to things Azure guess you may want to create, and a list of all of your resources. If you click \"resource groups\" in this list, you should see only one resource group (if any) unless you've been added to others or a different subscription. Search bar: in the middle of the top of the screen is white box in which you can type search terms include the kind of resource you want to see or create, or part of the name of specific resource you've created. This is what I use to create and find resources most of the time (and rarely use the links provided), more on that later. Shortcut buttons: the next few icons are short cuts to other functionality in the portal that we will cover in the future. Most are not critical.
A note about portal navigation: When you click anything in the portal, it creates a new window without reloading the browser and with an X at the top right. This mimics a \"close window\" function and You can use the X return to the dashboard, or you may simply use the menu and go to where you need to
Notice that like most things there are 4-5 ways to get to anywhere.
"},{"location":"exercises/azure_portal_walkthrough/#bonus-what-can-you-do-here","title":"Bonus: What can you do here?","text":"The primary purpose of using the portal and your resource group is to create things, and manage and monitor those things. For the purpose of this activity - since you don't really have anything - we can simply look at the 'activity log' in the left side-panel near the top. - this opens a new table of columns Operation name, Status, Time, Time stamp, etc that is probably empty for you. - Tables of information like this in the portal have filters at the top. The default activity is just for the previous 6 hours. If you click on the Find that filter called \"timespan\" and select 1 week (or longer) you can see when I created the resource group and the budget.
"},{"location":"exercises/azure_portal_walkthrough/#next-steps-create-a-storage-account","title":"Next Steps: Create a Storage Account","text":"For a good follow up exercise, see Creating a Storage Account with the Portal
"},{"location":"exercises/azure_portal_walkthrough/#about-portal-resource-pages","title":"About Portal \"Resource\" Pages","text":"Most cloud resources in the portal have a list of categories on the left side, and pages for each category in the center. The first page is the \"Overview\" which has the resource group, subscription, and other info important for that resource. this followed by the \"Activity Log\" showing how the resource has been used. Each of the following items on the left side is a new page of additional options to alter how the resource is configured. For example if you click the \"tags\" section you see the tags you added (if any) and can modify or add new tags.
Some of the options are not available on the forms when you create the resource, or the names of the options on these resource pages do not match the forms when you created the resources. In that case you may have to use two steps to configure the resource as you like, or better consider using a programmatic interface
Again we did not discuss any of the characteristics of cloud storage or how to use it but you should now have enough familiarity with the azure portal to follow other tutorials to create and use storage or other resources.
"},{"location":"exercises/azure_vm_walkthrough/","title":"Exercise: Creating and Connecting to a Virtual Machine (VM) for both Windows and Linux","text":""},{"location":"exercises/azure_vm_walkthrough/#about","title":"About","text":"This is an exercise and introduction to creating Virtual Machines (VMs) and related resources using the Azure Portal.
There are two nearly identical activities, and you only need complete one of them:
creating a Windows virtual machine and connecting with a graphic interface (GUI), namely Remote Desktop (rdp) to demonstrate how you may use full graphic software (like Rstudio, Matlab, etc) on a cloud computer creating a Linux Virtual machine and connection with the command line to demonstrate how you may use a terminal interface (or scripting) on a cloud computer. We will use a pre-configured virtual machine with software already installed for both versions. When creating a VM you can use an Azure template and there are many of these. The Data Science Virtual Machine (DSVM) from Azure has R, Python and many data science and statistical libraries available. For more information about the Azure DSVM see https://azure.microsoft.com/en-us/services/virtual-machines/data-science-virtual-machines/ and for the list of tools installed, see https://docs.microsoft.com/en-us/azure/machine-learning/data-science-virtual-machine/tools-included Azure has a new product called \"Azure Machine Learning\" that we may cover in a future session.
"},{"location":"exercises/azure_vm_walkthrough/#requirements-for-both-activities","title":"Requirements for both activities","text":"You need an Azure account with an active subscription, and a resource group of your own to work in. Fellows have these things provided.
This exercise assumes you understand how to use the Azure Portal, which is covered in the Azure Portal Walkthrough. In addition it's helpful to know what a virtual machine is but it's not crucial to complete the exercise. For more information on VMs see the readings session 2
It's helpful to have basic understanding of the \"Client-Server\" model of computing as the VM we create will be running servers (remote desktop server for Windows, and ssh command line server for Linux)
Finally we find that there are many layers of concepts related in this exercise related to IT Infrastructure, and we are happy to provde clarification as needed.
"},{"location":"exercises/azure_vm_walkthrough/#optional-video-walk-through","title":"Optional Video Walk-through","text":"There was a previous exercise that created a Windows Virtual Machine only (from 2021). The following video is based on the exercise. Watching and following the video is not necessary to complete the exercise on this page, and it does not cover linux. However if you find videos more helpful, or would like to see in detail how it works, please take advantage of this walk-through:
Link to Video for the Windows version of exercise. On mediaspace.msu.edu which requires an MSU log-in and is only availavble to participants in the MSU Cloud computing fellowship
The materials that the video follows from 2021 are here, only use those if you need to, otherwise please continue below.
"},{"location":"exercises/azure_vm_walkthrough/#creating-a-windows-virtual-machine","title":"Creating a Windows Virtual Machine","text":"This section is based on Windows, and is recommended for everyone as it is the easy way to connect to remote machine. For an equivalant exercise based on Linux, scroll down. If at any point, or if you are exploring, you can't seem to get the configuration correct (or there is a validation error you can't fix), starting over will not create any resources or incur charges. Go back to step 1 below.
"},{"location":"exercises/azure_vm_walkthrough/#requirements-for-windows-vms","title":"Requirements for Windows VMs","text":"To connect to a Windows VM desktop, it's recommend you use the Microsoft Remote Desktop client.
MacOS : install the Microsoft Remote Desktop Client, only available on the App Store: https://apps.apple.com/app/microsoft-remote-desktop/id1295203466?mt=12 Linux users install http://xrdp.org/ Windows Users should have the remote desktop client, but to ensure you do: In the search box on the taskbar, type Remote Desktop, and then select Remote Desktop Connection. "},{"location":"exercises/azure_vm_walkthrough/#1-selecting-the-resource-template","title":"1. Selecting the Resource Template","text":"In the Azure Portal open the top left menu, and click \"+ Create a resource\" option (the first option)
In the create resource search box, type \"data science virtual machine\" and press enter to search. It will present you with some of the suggested options as you type but please search.
In the options select Data Science Virtual Machine - Windows 2022 (preview) ** (note there is also a 2019 version but they seem very similar and will both work for this exercise )
Click \"Create\" ( note: do not click the \"start with a pre-set configuration\" option )
"},{"location":"exercises/azure_vm_walkthrough/#2-configure-the-vm-using-the-azure-portal","title":"2. Configure the VM using the Azure Portal","text":"The resource creation forms work as described in the Azure Portal but since we used a pre-set configuration some of the values will be completed.
"},{"location":"exercises/azure_vm_walkthrough/#basics","title":"Basics","text":" Subscription should be \"MSU Cloud Computing Fellowship\"
Resource Group should be your CF resource group (the one with your netid). The default is to create a new resource group but participants aren't allow to create their own group, you must select the resource group with your netid.
Virtual machine name Name: You could name it anything that is unique in the region you choose, but to help keep track of your resources, I strongly suggest using a name that includes your netid and the purpose of this VM: dsvm-netid-exercise2
one option is to combine the project or activity (e.g. ), your net id, and some description of what you are doing. In the name above, replace \"netid\" with your own MSU netid. - Note that different resources have different naming restrictions. For example VMs the rules are \"can be almost anything, but Azure resource names cannot contain special characters \\/\"\"[]:|<>+=;,?*@&, whitespace, or begin with '_' or end with '.' or '-' \" - Note if you have an existing VM with this name, add a number 2 or other suffix. We will delete this VM and create something more suitable in the future.
Region Select any region or use the deafult. In the future, when creating resources (like VMs) that access your storage account, should use the same region as that. For this exercise it doesn't matter as we will be deleting these resources. For me, the default was \"(US) Central US\"
Availability Options Leave the default (\"Availability Zone
\")
Availability Zone Leave the default ( \"Zones 1
\")
Security Type You must change this value to \"Standard
\" security type. The \"Trusted\" security option is for servers or production machines.
Image should be \"Data Science Virtual Machine - windows...\" if this is changed you may have to re-enter some info again. VM Architecture Leave as the default x64 (Intel compatible) Azure Spot Instance leave unchecked. Size You can leave the size that is currently selected. NOTES: This is how you select the specifications for CPU and memory. The size you for this exercise doesn't matter for the outcome, but it will show prices which may be interesting. If you click this drop-down menu you may see some other sizes and prices. The Monthly price assumes 24 hour/day operation. Your price to experiment will often be less than $1.00
Administrator Account Just like you need to log-in to your own computer, you must create a user account for the VM. Select a User name and account that you will easily remember, because you will need it to log-in to the new VM.
username : use any user name you will easily remember. I use my netid so I can always remember. password : this must be a complex password, but use something you can remember, or copy/paste from another program. Do not use your MSU password or any other passwords you use Licensing Unlike Linux, Windows requires a license, and this option are for organization with an arrangement with Azure. Leave this box unchecked and Azure will add the extra charges (a few cents per hour) for the use of Windows. If you use a Windows VM for your research, you may be able to use an MSU license. "},{"location":"exercises/azure_vm_walkthrough/#disks","title":"Disks","text":"Leave these as the defaults.
I want to point out one option that \"Delete with VM.\" We will talk about this in the future, but for now it's like purchasing the computer and the disk inside separately. In this case we are just testing so we will delete everything after we are done, but in practice there are reasons for keeping the disk around after you delete the VM so you may sometimes want to uncheck this box
"},{"location":"exercises/azure_vm_walkthrough/#networking","title":"Networking","text":"You must create a 'virtual network' for you VM to be connected to (note historically Azure created this for you). Click \"create new\" to open the new network form.
Create Virtual network
enter a Name : suggest adding \"-vnet\" to your proposed VM name. For me this was \"dsvm1-billspat-ccf23-vnet leave all the rest of the settings as-is with the defaults click the [OK] button at the bottom of this form. "},{"location":"exercises/azure_vm_walkthrough/#other-settings","title":"Other Settings","text":"For this exercise we'll be using the default values for almost all the pages except for Basics page. However you are encouraged to look through these options to see what is involved in creating a virtual machine. The Azure VM documentation covers many of them. For example a VM requires several networking components. The good news is that Azure will name and create these for you, which will see.
"},{"location":"exercises/azure_vm_walkthrough/#tags","title":"Tags","text":"For this exercise, using tags will be essential for identifying which components go to which VM. If you need more information see session 2 page for readings about tags. On the tags section, do the following:
Click \"tags\" in the top row of options (just before 'review and create') In the first row, For Name, type activity
and for the Value type session2 vm
or similar unique value. Optionally, in an additional row, create another tag with Name created by
and for Value put your netid. This kind of tag can be essential when you are sharing cloud accounts with other members of your work group, so that others in your group may identify who created the resources. "},{"location":"exercises/azure_vm_walkthrough/#review-and-create","title":"Review and Create","text":" click \"review and create\" at the bottom of the screen. If there are errors the form name will have a red dot next to it. Go back to that form and see what may be the issue.
If the Validation passed, it will display the approximate hourly cost to use this VM. Mine says 0.1920 USD/hr
Click \"Create\" and the deployment will start. It will take at most 15 minutes.
Now, please skip down the the Viewing VM Resources section below
"},{"location":"exercises/azure_vm_walkthrough/#optional-creating-a-linux-virtual-machine","title":"Optional: Creating a Linux Virtual Machine","text":"This sections is nearly identical to the section above with Windows, but uses Ubuntu Linux, and does not use a graphical interface (although with some work this is possible).
"},{"location":"exercises/azure_vm_walkthrough/#requirements","title":"Requirements","text":"To connect to Linux you need an terminal or command line interface with an ssh
client software. If you have used the MSU HPC, this is the same method for connection.
On Mac, the Terminal.app has ssh On Modern version of Windows, the cmd.exe command prompt has an ssh
command built in Linux desktop/laptops come with an ssh client "},{"location":"exercises/azure_vm_walkthrough/#creating-a-linux-virtual-machine","title":"Creating a Linux Virtual Machine","text":"If at any point, or if you are exploring, you can't seem to get the configuration correct (or there is a validation error you can't fix), starting over will not create any resources or incur charges. Go back to step 1 below.
"},{"location":"exercises/azure_vm_walkthrough/#1-selecting-the-resource-template_1","title":"1. Selecting the Resource Template","text":"In the Azure Portal open the top left menu, and click \"+ Create a resource\" option (the first option)
In the create resource search box, type \"data science virtual machine\"
In the options select Data Science Virtual Machine - Ubuntu 20.04
"},{"location":"exercises/azure_vm_walkthrough/#2-configure-the-vm-using-the-azure-portal_1","title":"2. Configure the VM using the Azure Portal","text":"The resource creation forms work as described in the Azure Portal but since we used a pre-set configuration some of the values will be completed.
"},{"location":"exercises/azure_vm_walkthrough/#basics_1","title":"Basics","text":" The Subscription should be \"Cloud Computing Fellowship\" and resource group should be your CF resource group (with your netid).
Virtual machine name Name: must be unique in the region. I suggest using your netid to name it, and add abbreviations for what you are creataing and for which activity. For example dsvm1-netid-ccf23 Use your actual NetId , for example \"dsvm1-billspat-ccf23\"
Note that different resources have different naming restrictions. For example VMs the rules are \"can be almost anything, but Azure resource names cannot contain special characters \\/\"\"[]:|<>+=;,?*@&, whitespace, or begin with '_' or end with '.' or '-' \"
Note if you have an existing VM with this name, add a number 2 or other suffix. We will delete this VM and create something more suitable in the future.
Region You may select \"(US) North Central US\" or any other US-based region. Availability Options select \"No infrastructure Redundancy required\" this option is for critical infrastructure that needs to withstand a serious outage (e.g. if a hurricane affects a data center). You may also see an \"availability zone\" option appear (perhaps with an error message \"The value must not be empty\"). Selecting \"\"No infrastructure Redundancy required\" in the availability zone will remove the \"availability zone\" field and error message. Security Type Leave as 'standard' Image should be \"Data Science Virtual Machine - Unbuntu..\" if this is changed you may have to select it again from the list. Any Linux image is fine for this tutorial as VM Architecture leave as x64 (Intel processor compatible) Run with Azure Spot discount leave unchecked. Size You can leave the size that is currently selected, which is based on the pre-set configuration from the previous step. This is how you select the specifications for CPU and memory. The size you for this exercise doesn't matter for the outcome, but it will show prices which may be interesting. If you click this drop-down menu you may see some other sizes and prices. The Monthly price assumes 24 hour/day operation. Your price to experiment will often be less than $1.00 Click \"see all sizes\" if you are feeling adventurous -- there are maybe 100 options. (click the [x]
in upper right to close the size selector window) Administrator Account Just like you need to log-in to your own computer, you must create a user account for the VM. Authentication Type For the purpose of this exercise, select \"password\" SSH Keys are strongly recommened but to keep this simple we will use a password. UserName Select a User name and account that you will easily remember, because you will need it to log-in to the new VM. You can use your MSU NetID for your username so it's easy to remember.
password : something you can remember, but is complex to be secure. Do not use your MSU password or any other passwords you use
"},{"location":"exercises/azure_vm_walkthrough/#disks_1","title":"Disks","text":"You can leave the defaults for this page.
"},{"location":"exercises/azure_vm_walkthrough/#networking_1","title":"Networking","text":"You must create a 'virtual network' for you VM to be connected to (note historically Azure created this for you). Click \"create new\" to open the new network form.
Create Virtual network
enter a Name : suggest adding \"-vnet\" to your proposed VM name. For me this was \"dsvm1-billspat-ccf23-vnet leave all the rest of the settings as-is with the defaults click the [OK] button at the bottom of this form. "},{"location":"exercises/azure_vm_walkthrough/#other-options","title":"other options","text":"For this exercise we'll be using the default values for almost all the pages, except for 'Basics' , 'Networking' and . However you are encouraged to look through these options to see what is involved in creating a virtual machine. The Azure VM documentation covers many of them. For example a VM requires several networking components. The good news is that Azure will name and create these for you, which will see.
"},{"location":"exercises/azure_vm_walkthrough/#tags_1","title":"Tags","text":"Using the Azure portal to create VM creates several resources (up to 12). Using tags will be essential for identifying which components go to which VM. This is the metadata associated with these resources. I suggest using a tag like \"activity\" to indicate which of our activities was used to create these resources.
Click \"tags\" in the top row of options (just before 'review and create') In the first row, For Name, type activity
and for the Value type session2
click \"review and create\" "},{"location":"exercises/azure_vm_walkthrough/#review-and-create_1","title":"Review and Create","text":"If there are errors the form name will have a red dot next to it. Go back to that form and see what may be the issue.
If the Validation passed, it will display the approximate hourly cost to use this Linux VM. Mine says 0.0730 USD/hr
Click \"Create\" and the deployment will start. It will take at most 15 minutes.
Linux Users continue to the next section
"},{"location":"exercises/azure_vm_walkthrough/#viewing-vm-resources-in-your-resource-group-windows-and-linux","title":"Viewing VM Resources in your Resource group (Windows and Linux)","text":"You have a few options now. You can wait for the deployment to complete in the portal. When it's ready, the Azure portal will display a message and a link to \"go to resource.\"
However you can also go to the page that lists the items in your resource group to find and explore while the deployment is in progress.
Open your resource group in the portal: click the portal menu on the top left, and select \"resource groups\" From the list, select your CF21 group. When the deployment is finished, you should see several new resources They will have the same name prefix \"CF21netid-dsvm\" but may have a suffix indicating the kind of resource (e.g. CF21-netid-dsvm1-ip The second column is the \"type\" which helps identify what they are click for a large view in a new tab/window
Select the item with type \"virtual machine\" and click on the name to open its resource page (for example, cf21-billspat-dsvmtest item in the screenshot above) "},{"location":"exercises/azure_vm_walkthrough/#the-vm-resource-page","title":"The VM Resource Page","text":"To see the details for your virtual machine, click the VM in your resource group if you haven't already.
click for larger view
Note that the Azure portal will show a few errors/warnings if the deployment is not complete. You may see a warning that the 'agent' in the VM is not working, but you can ignore it. It will go away when the VM configuration is complete.
There are many details here but some immediate things to notice:
in the top row are buttons to connect, start, restart and stop the vvm. in the top, \"essentials\" section the \"status\" should be \"running.\" on the right side is the assigned IP address which you need to connect. If you are connecting with RDP, then then RDP file has this address in it so you don't need to remember it. However this is the IP address you can use to connect directly from your Remote Desktop client or the SSH client. For now just need to know that this IP address is here on the main VM page. Note that, if you click the link on the address, it will take you to a new resource page just for the IP address (which is a distinct resource assigned to this VM resource) "},{"location":"exercises/azure_vm_walkthrough/#connecting","title":"Connecting","text":""},{"location":"exercises/azure_vm_walkthrough/#connecting-to-a-windows-vm-using-remote-desktop-protocol-rdp-client","title":"Connecting to a Windows VM using Remote Desktop Protocol (RDP) client","text":"You may connect to this VM running the Windows operating system with either graphical desktop, a command line connection, or both.
Every VM created in Azure has an \"IP Adress\" or internet address, and we use this to connect to.
The following Azure documentation describes how to connect to a Windows VM: https://docs.microsoft.com/en-us/azure/virtual-machines/windows/connect-logon
Here are more detailed instructions:
There is a 'connect' link on the left side in the \"Settings\" section of the left menu.
The connect pane looks something like this:
Connect with RDP (remote desktop protocol) is a Microsoft method for connecting to the graphical desktop. For Mac/Linux requires additional software (mentioned at the beginning of this page).
Step 1: In the Azure portal:
click \"connect\" on the left side menu if haven't already in the \"native rdp\" box, click \"select\" optional: if the machine is still deloying or turned off, you may get a warning that the machine is stopped. click start VM. a new pane displays, that may look like this: it may take a few seconds for Azure to configure the VM to use RDP, with the message that \"Auzre is configuring... \"When it's working on it, you will see it say \"validating\" in a gray box. Some users found that it never finished! However the VM is still available for a connection. You may wait for the grey \"validating\" button to change to \"configured\" but if it does not appear to be completing, please move on to the next step anyway. click \"download RDP file\" button and save the .rdp
file anywhere on your computer that you find it again Step 2:
after it's downloaded, find the .rdp
file and double click to open it which should start your remote desktop software. Mac users must have installed the Microsoft Remote Desktop client app ignore any security or error messages, click \"connect\" Alternatively you may also open your RPD software, create a new connection, and copy the IP address listed in the portal, in the Azure VM. and paste the IP address that is listed on the resource page for the VM.
Here is what the Windows screen may look like:
This is because we are using a temporary certificate but it is secure. Click \"Yes\"
Step 3. Enter the Username and password you used when configuring the VM in the \"Basics\" section above. For some versions of Windows, you need to click \"More choices\" in the Windows Security menu, otherwise the default is often your Microsoft or your laptop account Enter the user id and password you used when you created the VM. If the user account you entered does not work, you may have to put your user account in domain\\username form, and in this case, the domain is the name of the virtual machine and it is entered as vmname\\username, with a back-slash in-between, and with the same password. Starting up the VM Once you connect for the first time, the Windows VM will provision the VM user account and will install things during and after start-up. Feel free to close any windows. Once the installations are finished, you may use the machine as you would any other windows computer. You can start Jupyter notebooks to work with Python. Previous version of the Azure Data Science Virtual Machine has Rstudio installed on it, but the latest version only seems to have the base R interface.
We will cover how to transfer code and files to a VM in a later session. If you are comfortable with using the command line, you can use git clone...
to download code to run.
Explore to see what is already pre-installed on this VM. If you start with a standard version of Windows, you will have to install your own software.
When you finished with your remote session you may simply close the remote windows (leaving the VM running. See below for how to turn it off and delete it.
Optional: Connect to the Windows DSVM with ssh
NOTE In 2023 the 'connect' option in the Azure portal has a button beneath the RDP section that says \"more ways to connect.\" Inside this is a \"native ssh\" section. This only have instructions for how to connect with SSH. there is no special file to download like RDP.
This windows machine has an SSH Server running, and the security settings from the pre-configured version allow connections from SSH. If you are familiar with ssh and the command line, you may start the CMD.EXE on your windows computer, or the Mac Terminal, and enter ssh <username>@<ipaddress>
Where the username is the user you put for your VM when you created it, and the Public IP address is listed on the VM Resource page.
This is similar to how you connect to the MSU HPC, if you are HPC user.
You will be asked to add the host to your list hosts, and enter the password you used when you created the VM.
When you log-in you will be connected to the Windows command prompt (e.g. C:\\Users\\username>
To Exit, type exit
at the command prompt.
Next Steps: For information on turning off the VM and for eventually deleting the VM, scroll down below the Linux section as these operations are the same in the Azure portal for Linux or Windows virtual machines.
"},{"location":"exercises/azure_vm_walkthrough/#connecting-to-a-linux-vm-using-ssh","title":"Connecting to a Linux VM using SSH","text":"We will connect and use this remote VM running the Linux operating system with a command line connection. It is possible to use a graphical connection but requires additional setup beyond the scope of the short exercise.
In addition this assumes you have some familiarity with using the command line and starting your terminal program.
There is a 'connect' link above the 'essentials' list, and a connect link on the left side - they both go to the same place.
Connect with SSH
this is the standard method of connecting with ssh, but we've included as much detail as possible for those who are new to using ssh.
On the main \"overview\" page of the VM resource, find the \"Public IP Address\" on the top right side. Copy this IP address to the clipboard, or make a note of it. Mine was 20.98.28.63. Note that these VMS also have an internal IP address that start with 10.x.x.x that will not work for connecting from your laptop. Use the Public IP address. not all VMs have a public IP address but this one will. also make a note of the User ID and password you used to create the VM above side note, in the \"connect\" form of the VM resource pages, it describes how to use an ssh key, even though we did not create an ssh key when we created a VM. If you did not create an ssh key, you do not need to follow these instructions.
on your desktop/laptop, start your terminal program on MacOS/Linux or cmd.exe
if you using Windows.
Enter the command as displayed, which is something like ssh vmusername@vmipaddress
In my case, my command is ssh patbills@20.98.28.63
If this is the first time connection, you'll get the standard ssh warning \"The authenticity of host '20.98.28.63 (20.98.28.63)' can't be established.\"
simply say \"yes\" and enter Enter the password you used when configuring the VM in the \"Basics\" section above. (note that ssh does not show any key movement or * when you type a password) it takes a while to connect for the fist time as the VM configures software and prepares your user account You may use the machine as you would any other linux computer. For more information about what software is installed, see We will cover how to transfer code and files to a VM in a later session.
When you finished with your remote session you may simply close the remote windows (leaving the VM running. See below for how to turn it off and delete it.
"},{"location":"exercises/azure_vm_walkthrough/#starting-and-stopping-the-vm-both-windows-and-linux","title":"Starting and Stopping the VM (both Windows and Linux)","text":"There are three ways to \"stop\" or turn off a VM. 1. when connected to it, e.g. in the remote desktop, use Windows to turn it off. The VM is then \"stopped.\" In a Linux ssh session you may use a command like sudo shutdown -h now
When the Operating system is shut off, and hence tthe VM is not running, but it is still \"allocated.\" When you turn it back on, it will come on immediately. 1. Use the Azure portal to \"stop\" the VM which shuts down Windows (gracefully if possible) and 'deallocates' the VM. Restarted the VM appears to be the same process, but Azure must allocate resources first to run it, then power it up. This is cheaper then the first method in the long run 1. Delete it.
"},{"location":"exercises/azure_vm_walkthrough/#stopping-deallocating-the-vm-with-the-portal","title":"Stopping (deallocating) the VM with the Portal:","text":" Go to the resource page for the VM, if you are not already. If you are just entering the portal, find your resource group, find the VM in your resource group (identified as a VM in the \"type\" column of the list of resources), and click to open the resource page. The Status field near the top of this screen will indicate running or stopped. Find the Start and Stop buttons near the top of this screen and click \"stop\" if the machine is running. There is a warning about losing your IP address, with a check box to reserve it. If you plan on deleting the VM now, click \"ok\" If you plan on restarting the VM and reconnecting, first check the box \"reserve the IP\" then click OK The default is to use a \"dynamic\" address which is assigned every time you turn on the VM When using a dynamic address, you must copy/paste the ip address, or re-download the RDP connection file everytime you restart the machine the solution is to use a \"Static IP\" either when you create the VM, or assigning one after the VM is created. and checking the box does so. you can also convert to a static IP with the portal, but it is not a straightforward process, see https://docs.microsoft.com/en-us/azure/virtual-network/virtual-networks-static-private-ip-arm-pportal Pricing for a static ip is here: https://azure.microsoft.com/en-us/pricing/details/ip-addresses/ which as of now is $0.0036/hour which is charged even if the VM is turned off. That is approx $2.70/month It's a good idea to leave VMs in a \"stopped (deallocated)\" state if you are not using them for computations or providing a service, just as you would turn off or put your laptop to sleep. The main reason for this is for security. "},{"location":"exercises/azure_vm_walkthrough/#deleting-the-resources-both-windows-and-linux","title":"Deleting the Resources (both Windows and Linux)","text":" Open the Resource group as above When creating resources using the template as we did above, the resources associated with this VM will all start with the same prefix, so they are easy to identify. Select them with checkboxes, and click the \"Delete\" button which is on the top right of the screen (not the \"delete resource group\" button) If it's not obvious which resources are all included, you may also use the \"tag\" you created to filter what is listed and only show those with the same \"tag.\" For more information see https://docs.microsoft.com/en-us/azure/azure-portal/manage-filter-resource-views . If you add filter on tag, then you may select all the items that are shown, and delete those. after selecting confirm the deletion by typing \"yes\" Creating resources just to delete them may seem wasteful however we will cover how to save a \"snapshot\" and/or \"image\" of your VM's disk so that you may re-use any work to install and configure software withtout incurring charges.
"},{"location":"exercises/azure_vm_walkthrough/#more-references","title":"More References","text":"Azure has very abbreviated versions of this exercise if you would like another perspective. They assume you can create your own resource group (which you don't have the ability to do currently in the fellowship)
https://docs.microsoft.com/en-us/azure/machine-learning/data-science-virtual-machine/overview#next-steps
**Data Science Use Case Tutorials from Azure: **
Windows: This tutorial uses products that Azure no long supports, and for Windows users they really push to use their \"Azure Machine Learning\" product. However the Windows DSVM offers a really fast way to get access to a windows desktop graphical interface https://docs.microsoft.com/en-us/azure/machine-learning/data-science-virtual-machine/vm-do-ten-things
Linux: https://docs.microsoft.com/en-us/azure/machine-learning/data-science-virtual-machine/linux-dsvm-walkthrough
If you follow these, just remember to delete the resources you create when you are done exploring
Return to the Session 2 page
"},{"location":"exercises/azure_windows_vm_walkthrough/","title":"Exercise: Creating a Windows Virtual Machine (VM)","text":"This is a previous version of a windows-only VM walk through from 2020, kep for historical reasons
Please use our updated version that covers both Windows and Linux
Link to Video for this exercise on mediaspace.msu.edu (requires log-in)
"},{"location":"exercises/azure_windows_vm_walkthrough/#about","title":"About","text":"This is an exercise and introduction to creating Virtual Machines (VMs) and related resources using the Azure Portal. This exercise assumes you understand how to use the Azure Portal, which is covered in the Azure Portal Walkthrough. In addition it's helpful to know what a virtual machine is but it's not crucial to complete the exercise. For more information on VMs see session 2).
We will use a pre-configured virtual machine with software already installed. When creating a VM you can use an Azure template and there are many of these. The Data Science Virtual Machine (DSVM) from Azure has R, Python and many data science and statistical libraries available. For more information about the Azure DSVM see https://azure.microsoft.com/en-us/services/virtual-machines/data-science-virtual-machines/
"},{"location":"exercises/azure_windows_vm_walkthrough/#requirements","title":"Requirements","text":"You need an account in azure with an active subscription, and a resource group of your own to work in. Fellows have these things provided.
"},{"location":"exercises/azure_windows_vm_walkthrough/#creating-and-connecting-to-a-windows-virtual-machine","title":"Creating and Connecting to a Windows Virtual Machine","text":""},{"location":"exercises/azure_windows_vm_walkthrough/#requirements_1","title":"Requirements","text":"To connect to a Windows VM desktop, it's recommend you use the Microsoft Remote Desktop client.
MacOS : install the Microsoft Remote Desktop Client, only available on the App Store: https://apps.apple.com/app/microsoft-remote-desktop/id1295203466?mt=12 Linux users install http://xrdp.org/ Windows Users ensure you have the client : In the search box on the taskbar, type Remote Desktop, and then select Remote Desktop Connection. "},{"location":"exercises/azure_windows_vm_walkthrough/#creating-a-windows-virtual-machine","title":"Creating a Windows Virtual Machine","text":"If at any point, or if you are exploring, you can't seem to get the configuration correct (or there is a validation error you can't fix), starting over will not create any resources or incur charges. Go back to step 1 below.
"},{"location":"exercises/azure_windows_vm_walkthrough/#1-selecting-the-resource-template","title":"1. Selecting the Resource Template","text":"In the Azure Portal open the top left menu, and click \"+ Create a resource\" option (the first option)
In the create resource search box, type \"data science virtual machine\"
In the options select **Data Science Virtual Machine - Windows 2019 **
The \"Plans\" section has a description of the template if you would like to know more.
Click the \"start with a pre-set configuration\" option.
"},{"location":"exercises/azure_windows_vm_walkthrough/#2-select-the-pre-set-configuration","title":"2. select the pre-set configuration","text":"These configurations help to select your VM size based on your activity. We will use the default options and click \"Continue to create a VM\"
The options do not affect the outcome of the exercise so at this step explore each option
Click \"Continue to create a VM\"
"},{"location":"exercises/azure_windows_vm_walkthrough/#3-configure-the-vm-using-the-azure-portal","title":"3. Configure the VM using the Azure Portal","text":"The resource creation forms work as described in the Azure Portal but since we used a pre-set configuration some of the values will be completed.
"},{"location":"exercises/azure_windows_vm_walkthrough/#basics","title":"Basics","text":" The Subscription should be \"Cloud Computing Fellowship\" and resource group should be your CF resource group (with your netid). As we create additional resource groups for this
Virtual machine name Name: CF21-netid-dsvmtest One option is to combine the project (e.g. the fellowship), your net id, and some description of what you are doing. In the name above, replace \"netid\" with your own MSU netid.
Note that different resources have different naming restrictions. For example VMs the rules are \"can be almost anything, but Azure resource names cannot contain special characters \\/\"\"[]:|<>+=;,?*@&, whitespace, or begin with '_' or end with '.' or '-' \"
Note if you have an existing VM with this name, add a number 2 or other suffix. We will delete this VM and create something more suitable in the future. 1. Region Select \"(US) North Central US\" 1. Availability Options select \"No infrastructure Redundancy required\" this option is for critical infrastructure that needs to withstand a serious outage (e.g. if a hurricane affects a data center). You may also see an \"availability zone\" option appear (perhaps with an error message \"The value must not be empty\"). Selecting \"\"No infrastructure Redundancy required\" in the availability zone will remove the \"availability zone\" field and error message. 1. Image should be \"Data Science Virtual Machine - windows...\" if this is change you may 1. Azure Spot Instance leave unchecked. 1. Size You can leave the size that is currently selected. This is how you select the specifications for CPU and memory. The size you for this exercise doesn't matter for the outcome, but it will show prices which may be interesting. If you click this drop-down menu you may see some other sizes and prices. The Monthly price assumes 24 hour/day operation. Your price to experiment will often be less than $1.00 1. Administrator Account Just like you need to log-in to your own computer, you must create a user account for the VM. Select a User name and account that you will easily remember, because you will need it to log-in to the new VM. * username : use any user name you will easily remember, perhaps your netid * password : something you can remember, but is complex to be secure. Do not use your MSU password or any other passwords you use 1. Licensing Unlike Linux, Windows requires a license, and this option are for organization with an arrangement with Azure. Leave this box unchecked.
"},{"location":"exercises/azure_windows_vm_walkthrough/#disks-and-other-settings","title":"Disks and Other Settings","text":"For this exercise we'll be using the default values for almost all the pages except for Basics page. However you are encouraged to look through these options to see what is involved in creating a virtual machine. The Azure VM documentation covers many of them. For example a VM requires several networking components. The good news is that Azure will name and create these for you, which will see.
"},{"location":"exercises/azure_windows_vm_walkthrough/#tags","title":"Tags","text":"Using tags will be essential for identifying which components go to which VM. This is the metadata associated with these resources. I suggest using a tag like \"activity\" to indicate which of our activities was used to create these resources.
Click \"tags\" in the top row of options (just before 'review and create') In the first row, For Name, type activity
and for the Value type session2
click \"review and create\" "},{"location":"exercises/azure_windows_vm_walkthrough/#review-and-create","title":"Review and Create","text":"If there are errors the form name will have a red dot next to it. Go back to that form and see what may be the issue.
If the Validation passed, it will display the approximate hourly cost to use this VM. Mine says 0.1920 USD/hr
Click \"Create\" and the deployment will start. It will take at most 15 minutes.
"},{"location":"exercises/azure_windows_vm_walkthrough/#4-the-resources","title":"4. The Resources","text":"While the deployment is in progress you may explore the operation details or click any of the resources that have been created.
Open your resource group in the portal: click the portal menu on the top left, and select \"resource groups\" From the list, select your CF21 group. When the deployment is finished, you should see several new resources They will have the same name prefix \"CF21netid-dsvm\" but may have a suffix indicating the kind of resource (e.g. CF21-netid-dsvm1-ip The second column is the \"type\" which helps identify what they are click for a large view in a new tab/window
Select the item with type \"virtual machine\" and click on the name to open its resource page (for example, cf21-billspat-dsvmtest item in the screenshot above) "},{"location":"exercises/azure_windows_vm_walkthrough/#5-the-vm-resource-page","title":"5. The VM Resource Page","text":"To see the details for your virtual machine, click the VM in your resource group if you haven't already.
click for larger view
There are many details here but some immediate things to notice:
in the top row are buttons to connect, start, restart and stop the vvm. in the top, \"essentials\" section the \"status\" should be \"running.\" on the right side is the assigned IP address which you need to connect. Highlight and copy and paste this address. If you click the link on the address, it will take you to a new resource page just for the IP address (which is a distinct resource assigned to this VM resource) "},{"location":"exercises/azure_windows_vm_walkthrough/#6-connecting","title":"6. Connecting","text":"You may connect to this VM running the Windows operating system with either graphical desktop, a command line connection, or both.
The following Azure documentation describes how to connect to a Windows VM: https://docs.microsoft.com/en-us/azure/virtual-machines/windows/connect-logon
Here are more detailed instructions:
There is a 'connect' link above the 'essentials' list, and a connect link on the left side - they both go to the same place.
Connect with RDP (remote desktop protocol) is a Microsoft method for connecting to the graphical desktop. For Mac/Linux requires additional software (mentioned at the beginning of this page).
Click \"connect\" and select \"rdp\" if it isn't already. click \"download RDP file\" button and save the .rdp
file anywhere on your computer that you find it again after it's download, and if you Mac users have installed the RDP client, then double click the .rdp
file to open your remote desktop software.
On windows, any security or error messages, click \"connect\"
Alternatively you may also open your RPD software without downloading the RDP file, and copy the IP address listed on and paste the IP address that is listed on the resource page for the VM
When you connect, if the VM is not running, you will get an error message. Here is what the Windows screen looks like:
This is because we are using a temporary certificate but it is secure. Click \"Yes\"
Enter the Username and password you used when configuring the VM in the \"Basics\" section above. you may be able to simply enter the user name and password directly If not, in the Windows Security window, select More choices and then Use a different account. Enter the credentials for an account on the virtual machine and then select OK. If the user account you entered does not work, you may have to put your user account in domain\\username form, and in this case, the domain is the name of the virtual machine and it is entered as vmname\\username, with a back-slash in-between, and with the same password. Once you connect, you may see Windows starting up and installing things. Feel free to close any windows. Once the installations are finished, you may use the machine as you would any other windows computer. If you type Rstudio in the search box, you may launch an Rstudio session on this remote computer. It also has Python, many python libs and Jupyter notebook.
We will cover how to transfer code and files to a VM in a later session.
When you finished with your remote session you may simply close the remote windows (leaving the VM running. See below for how to turn it off and delete it.
Optional: Connect to the Windows DSVM with ssh
This windows machine has an SSH Server running, and the security settings from the pre-configured version allow connections from SSH. If you are familiar with ssh and the command line, you may start the CMD.EXE on your windows computer, or the Mac Terminal, and enter ssh <username>@<ipaddress>
Where the username is the user you put for your VM when you created it, and the Public IP address is listed on the VM Resource page.
This is similar to how you connect to the MSU HPC, if you are HPC user.
You will be asked to add the host to your list hosts, and enter the password you used when you created the VM.
When you log-in you will be connected to the Windows command prompt (e.g. C:\\Users\\username>
To Exit, type exit
at the command prompt.
"},{"location":"exercises/azure_windows_vm_walkthrough/#7-starting-and-stopping-the-vm","title":"7. Starting and Stopping the VM","text":"There are three ways to \"stop\" or turn off a VM. 1. when connected to it, e.g. in the remote desktop, use Windows to turn it off. The VM is then \"stopped.\" The VM is not running, but it is still \"allocated.\" When you turn it back on, it will come on immediately. 1. Use the Azure portal to \"stop\" the VM which shuts down Windows (gracefully if possible) and 'deallocates' the VM. Restarted the VM appears to be the same process, but Azure must allocate resources first to run it, then power it up. This is cheaper then the first method in the long run 1. Delete it.
Stopping (deallocating) the VM with the Portal:
Go to the resource page for the VM, if you are not already. If you are just entering the portal, find your resource group, find the VM in your resource group (identified as a VM in the \"type\" column of the list of resources), and click to open the resource page. The Status field near the top of this screen will indicate running or stopped. Find the Start and Stop buttons near the top of this screen and click \"stop\" if the machine is running. There is a warning about losing your IP address, with a check box to reserve it. If you plan on deleting the VM now, click \"ok\" If you plan on restarting the VM and reconnecting, first check the box \"reserve the IP\" then click OK The default is to use a \"dynamic\" address which is assigned every time you turn on the VM When using a dynamic address, you must copy/paste the ip address, or re-download the RDP connection file everytime you restart the machine the solution is to use a \"Static IP\" either when you create the VM, or assigning one after the VM is created. and checking the box does so. you can also convert to a static IP with the portal, but it is not a straightforward process, see https://docs.microsoft.com/en-us/azure/virtual-network/virtual-networks-static-private-ip-arm-pportal Pricing for a static ip is here: https://azure.microsoft.com/en-us/pricing/details/ip-addresses/ which as of now is $0.0036/hour which is charged even if the VM is turned off. That is approx $2.70/month It's a good idea to leave VMs in a \"stopped (deallocated)\" state if you are not using them for computations or providing a service, just as you would turn off or put your laptop to sleep. The main reason for this is for security. "},{"location":"exercises/azure_windows_vm_walkthrough/#8-deleting-the-resources","title":"8. Deleting the Resources","text":" Open the Resource group as above When creating resources using the template as we did above, the resources associated with this VM will all start with the same prefix, so they are easy to identify. Select them with checkboxes, and click the \"Delete\" button which is on the top right of the screen (not the \"delete resource group\" button) If it's not obvious which resources are all included, you may also use the \"tag\" you created to filter what is listed and only show those with the same \"tag.\" For more information see https://docs.microsoft.com/en-us/azure/azure-portal/manage-filter-resource-views . IF you add filter on tag, then you may select all the items that are shown, and delete those. after selecting confirm the deletion by typing \"yes\" "},{"location":"exercises/exercise_budget_alert/","title":"MSU Cloud Computing Fellowship: Costs and Budgets with Microsoft Azure","text":"(Almost) everything you do in Azure has a cost, and costs for resources often acrue over time, wether the resource is in use or not. This is a short excercise to recieve an email when you have spent a certain amount of money. This can be valuable if you are experimenting and forget to delete a resource that you no longer need.
For this work, You must first have a 'budget' in your resource group. We created a budget for 2022 for all fellowship participants that you can use for creating alerts.
If you have not yet, please go through the \"Intro to the Azure portal\" exercise for more context about what we are doing.
"},{"location":"exercises/exercise_budget_alert/#background","title":"Background","text":"See the \"costs\" section in the topics for details.
In Azure you can set a 'budget' for a single resource (like a virtual machine), your whole resource group, or we could set on for the whole fellowship. However setting a budget doesn't stop you from spending anything or invoke any action.
Once you set a budget or maximum dollar amount you'd like to spend, you need to to then add either actions or alerts when some threshold within that budget is reached.
We have set budgets for your resource group in the fellowship. However you need to now set an alert to send you an email when you reach a spending amount. You can set multiple alerts. for example, we will set an alert when you reach a certain threshold.
For details about this service, see the Azure Cost Management + Billing documentation. This specific exercise works with budgets and assumes there is one in your resource group. If you do not have a budget on your account, or if you'd like to create a new kind of budget please contact us and we will assist you. However if you are comfortable with Azure concepts, see this advanced [Azure Budget Tutorial
https://learn.microsoft.com/en-us/azure/cost-management-billing/costs/tutorial-acm-create-budgets
"},{"location":"exercises/exercise_budget_alert/#steps-to-add-a-cost-alert-to-an-existing-budget-your-resource-group","title":"Steps to add a \"cost alert\" to an existing budget your resource group.","text":"Find the Premade Budget
Log into https://portal.azure.com You should see a single resource group, or be put into one automatically. Open your resource group if is not already The left side bar had properties for the resource group. In the left side-bar, select \"budgets\" (scroll down) You should see a single budget named with netid, like this \"ccf23_sparty_budget\" Click on that budget click 'edit budget' link near the top left review the information Add an 'alert' to that budget
in the edit budget form, alert condition: type = Actual enter 50 percent under action group, leave it as 'none' (alerts are different from actions) in email, put your preferred email address (I don't know if gmail etc will work) add a second email to inform the instructors for the cloud fellowship: billspat@msu.edu select your preferred language, if it's available (the default is US English) click 'Save' You may add additional alerts if you want to be reminded at different thresholds of spending, e.g. 25%, 50%, 80%. One advantage to setting a low threshold like 20% of your budget is to help you learn how much things cost or to be alerted if there are resources you've created but didn't realize they still existed or were costing anything.
I hope these instructions were clear but again, any questions please contact us using email or MS Teams.
"},{"location":"exercises/exercise_create_storage_account/","title":"Creating a \"Storage Account\" with the Azure Portal","text":"(From: Session 1 - Introduction)
This is a good activity to explore the Azure portal by creating a new resource. Storage accounts do not accrue much cost until you fill them up with data. Please review the exercise Azure Portal Walk-through if you haven't.
We have not talked about Cloud storage, however you don't need to know about Cloud storage to complete this tutorial. This is simply an exercise to see how you would create something using the Azure portal, and Cloud storage is a benign (and very inexpensive) resource to use an example.
Note that a \"storage account\" is not the same as \"disk\" you will see when you create a virtual machine. We will discuss the difference in detail in the session on storage.
"},{"location":"exercises/exercise_create_storage_account/#requirements","title":"Requirements:","text":" An Azure Account with valid subscription A Resource group All members of the current Cloud Computing Fellowship cohort have these things.
"},{"location":"exercises/exercise_create_storage_account/#creating-a-storage-account-step-by-step","title":"Creating a storage account step-by-step.","text":""},{"location":"exercises/exercise_create_storage_account/#first-step-accessing-a-storage-account-template","title":"First Step: Accessing a Storage Account Template.","text":" Log-in to the Azure portal if you have not already. (https://portal.azure.com) Click the menu (top left, three horizontal bars). Select Home from the menu. (This is to ensure we all have the same view) Select Create a Resource in the upper left screen under Azure Services. Yes we could have click \"storage accounts\" instead but we want to demonstrate how to use the next screen... Note: The current screen is where you can create almost any service Azure offers, and additional services created by third-parties or companies that are not Microsoft. When you are starting, ensure you are creating a service from Microsoft (we'll show you how in the next step) In the lower search bar (labeled Search services and marketplace), type Storage account Note that \"storage\" alone lists many other kinds of resources. You will see a list of several services. Select the first one labeled Storage account (icon looks like a green spreadsheet). Note: The description of the service will say the provider, which should be Microsoft, if not go back using the back button and search for storage account again. Click Create under Storage account. "},{"location":"exercises/exercise_create_storage_account/#second-step-setting-up-the-storage-account","title":"Second Step: Setting up the Storage Account.","text":"Note: The Azure resource creation screens mostly work like this: there are so many settings Azure has split these up into groups which are listed horizontally across the top. You may work though these by clicking each group, OR finish a screen, and click \"Next..\" button on the bottom of the form. At any time you may click \"Review and Create\" and if you've missed some crucial setting, Azure will not let you create the resource without fixing it. We will go page-by-page for these settings
Basics:
Subscription: Cloud Computing Fellowship Resource Group: Select your resource group provided to you. Storage Account Name:
some resources have restrictions on naming. Next to storage account is an \"i\" in a circle that has more information. For storage accounts, they must be unique in region, and only numbers and lowercase letters are allowed. I don't know if Non-US letters are allowed (e.g.\u7bb1) use your MSU ID (NetID) when you name things so help me keep track and also to help find a name that is unique. So, replace \"NETID\" with your MSU NetID here: \"stNETIDccf22\" e.g. stbillspatccf22 If you are repeating this tutorial, simply add a \"2\" or \"B\" e.g. \"stbillspatccf22B\" We can delete these experiments later. Region (Location): Change this location to US Central. Click in here to see the options. In practice, pick the region that is closest to you or where your data will be moving to (e.g. North Central US for MSU) but there are other considerations.
Performance: Standard Redundancy: change from GeoRedundant to \"Locally Redundant\" (LRS). We won't see a difference, and LRS is cheaper. Beneath that, leave the \"make read access....\" box checked. Click next...Advanced Advanced:
Leave all of these settings as-is. Click next... Networking: Leave all of these settings as-is. Click next... Data Protection: Leave all of these settings as is. These settings allow you to recover files up to 7 days after deleting or over-writing. click next... Encryption: Leave all of these settings as is. click next... Tags: Tags are optional but eventually highly recommended. For now you can leave them blank. Review and create review gives you a chance to double check your settings before committing click Create "},{"location":"exercises/exercise_create_storage_account/#third-step-deploying-the-storage-account","title":"Third Step: Deploying the Storage Account","text":" Deployment Azure calls the process of creating cloud resources a \"deployment.\" This term comes from the software engineering process of first \"building\" an application or utility (or \"compiling\" which is often not necessary for scripting languages like Python or R) and then moving that application onto the IT servers that make it available. On your own computer you download software that is already \"built\" (e.g. MS Word) and installing it is a form of deployment. Deployment takes a while as the Azure Resource Manager takes your order and runs the code to generate the cloud resource you've described. You may leave this page and the deployment will continue in the background. Finish and Review
When the deployment is complete, in the top bar of the Azure portal you'll see a number badge on the \"Notification\" icon indicating the number of messages you have (probably just 1). Click on the Notifications icon to show this message. the message should be something like: Deployment succeeded Deployment 'resourcename_12345678901234' to resource group 'group name' was successful. \"Go to Resource\" button will open the Portal page with options for the resource \"Pin to Dashboard\" will create a new tile that is a shortcut to this resource on your dashboard for easy access. If you want to experiment with dashboard arranging then it's ok to click this and easy to remove later from your Portal Dashboard (it will be added to the bottom) Examine Resource (storage)
We have not talked about how storage works but the storage resource page is a good example to learn how the Portal is organized. If you didn't already click \"go to resource\", open the top menu and click \"home\" the Portal \"Home\" has a list of \"recent resources\" and this should be at the top. "},{"location":"exercises/exercise_using_the_cloud_to_summarize_and_visualize_data/","title":"Exercise: using the cloud to summarize and visualize data.","text":""},{"location":"exercises/exercise_using_the_cloud_to_summarize_and_visualize_data/#overview","title":"Overview","text":"The basic task of this project is analyze data in the cloud: copying data and code to the cloud, and using cloud computing to run a basic script, and save the output to cloud storage. We provide the data and the code (in R and Python ) with clear description of how to run it.
The goal is to assess whether the structure of this material was sufficient (did we do our jobs?), that you were able to synthesize it, and hence you as a fellow are ready to take on a cloud project.
The goal is not to determine your ability to run code (which you most like can already do!), use git, use the command line, or to be a systems admin but just to assess what piiece of this small puzzle we may need to reinforce. All steps should be able to be completed without having to write any code at all, except tp run the program. We hope this unified exercise helps fill any gaps in practical and potentially practical understanding of how computing in the cloud works. Or, even better, that it's so easy that it seems like busy work.
"},{"location":"exercises/exercise_using_the_cloud_to_summarize_and_visualize_data/#process","title":"Process","text":"We are here to help along the way, and happy to answer any an all questions. The goal is to not present a step by step tutorial but to provide guidelines for how you should approach the problem. If you have issues it would be very help to us for you to review the course materials to determine if we've provided the information or links to the information to know if we need to augment these materials. However we will aways answer your questions as they come up.
If you review this and find it very easy, you want to use something other than a VM to do calculations, or have code and data of your own you'd like to run, that is great! The goal is to help you accomplish a computation in a way that you may use in your project.
"},{"location":"exercises/exercise_using_the_cloud_to_summarize_and_visualize_data/#output","title":"Output","text":"We ask that you prepare a short, informal description of the resources you used, how you used them to move data and execute code, and the costs associated with those resources. In addition any technical challenges, lack of clear documentation, or any other issues that needed to be overcome to complete this will be helpful to us.
"},{"location":"exercises/exercise_using_the_cloud_to_summarize_and_visualize_data/#data","title":"Data","text":"The data is a simple CSV file of approximately 450,000 weather observations near the MSU campus. Details about the data file and it's origin are documented in the code site linked below. In addition a direct link for downloading the suggested data set will be sent to the fellows in email. While the data is in the public domain, for each download there is a small cost. Hence we are not posting the URL on this public site to prevent bots from repeatedly downloading the file.
"},{"location":"exercises/exercise_using_the_cloud_to_summarize_and_visualize_data/#code","title":"Code","text":"The code we suggest you run is available on Github: https://github.com/msucloudfellowship/msu_ccf_miniproject There is a Python and an R version. The data is not in the github repository, but you should have recieved a link to download it, and there are instructions and code for downloading the data from the source for Lansing or other weather stations.
"},{"location":"exercises/exercise_using_the_cloud_to_summarize_and_visualize_data/#task-details","title":"Task Details","text":"We expect you to create the following elements. If you already have some of these cloud resources, of course it's more efficient to re-use those but we want to get a cost element for all aspects, so we recommend creating a new resources (e..g. a new storage account) for this mini project.
You can use the Azure portal to accomplish many if not not all of these tasks, excpet to run your actual program,
create cloud storage (account, etc) copy data into storage create and start a Virtual Machine (VM) that can run this code. The instructions refer to the Azure data science virtual machine, which we discussed in the session \"how to cloud\" . You may also use container services (e.g. Azure Container Instance) to run this code if you like. hint: consider using tags to uniquely identify resources you are creating for this project to easily identify all resources used for 1) cost analysis 2) deleting connect and log-in to the VM, and get the scripts into the machine, install software as needed copy the data from storage to the virtual machine disk, by attaching the storage to the compute service and access via that connect or otherwise copy the data (hint: the DSVM comes with the Azure storage explorer installed) run script while pointing to the data file location this will output images of plots (PDF or PNG formatted) save output files to cloud storage turn off delete resources related to the VM determine total costs. See the topic on costs if you commplete this in less than a day, the costs for these resources will not be immediately visible in the Azure cost analysis tool. Potentially wait until next day to view the costs in the Azure portal. This analysis was very small, so the costs will be very very small. uses the outputs from the costs analysis to add a list of resources and costs to your report. As mentioned above, if you use unique tags when creating the virtual machine it will be easier to identify costs specific to this activity "},{"location":"exercises/exercise_using_the_cloud_to_summarize_and_visualize_data/#due-dates","title":"Due dates","text":"The due date will be discussed in the email but they are flexible.
"},{"location":"exercises/exercise_windows_filestorage/","title":"Exercise: Using File Storage with Windows VM","text":""},{"location":"exercises/exercise_windows_filestorage/#overview","title":"Overview","text":" Not all versions of Windows can use this. For much more detail, see the Azure documentation page \"Mount SMB Azure file share on Windows\" "},{"location":"exercises/exercise_windows_filestorage/#using-file-storage-with-windows-vm-step-by-step","title":"Using File Storage with Windows VM step-by-step.","text":""},{"location":"exercises/exercise_windows_filestorage/#first-step-create-a-storage-account","title":"First Step: Create a Storage Account","text":" We have already set up storage accounts in a previous tutorial. Refer back to the tutorial (Creating Azure Cloud Storage Accounts) if you still need to create one, however the one you have currently should work for this tutorial. "},{"location":"exercises/exercise_windows_filestorage/#second-step-create-an-azure-file-share","title":"Second Step: Create an Azure File Share","text":" Go to your Storage Account
On the homepage, there should be a list of \"Resources\" in the middle of the page. Click the one with Type listed as Storage Account Select File Shares
On the left side of the screen there is a menu. Under the \"Data Storage\" section is the File Shares button Add a File Share
Towards the top of the screen click the + File Share button File Share Properties
Basics Name the File Share qsfileshare Keep Tier as Transaction Optimized Click Review + Create Create a new txt file titled qsTestFile on your local machine
Go to your file folder, and in any directory of your choice - Right Click and select a new .txt file With your file share open in Azure, click Upload (on the top middle section of the screen)
Upload your created txt file
Select Browse your Files and navigate to the directory you chose earlier, then attach your txt file "},{"location":"exercises/exercise_windows_filestorage/#third-step-deploying-a-vm","title":"Third Step: Deploying a VM","text":"We've created the storage account and the file share with a file in it. We now need to deploy a VM.
Create the Resource
Expand the left side menu and click Create a Resource Under \"Popular Azure services\" select Virtual machine Setting the VM Properties
Basics Resource Group: Select the Cloud Computing Fellowship Resource group Virtual machine name: qsVM Security Type: Standard Image: Windows Server 2019 Datacenter - x64 Gen2 Set your Username and Password to something you will remember for logging in to the VM Select Inbound Ports: HTTP and RDP (3389) Select Review and Create Select Create When deployment is done, select Go to Resource "},{"location":"exercises/exercise_windows_filestorage/#fourth-step-connect-to-your-vm","title":"Fourth Step: Connect to Your VM","text":" Select Connect on the VM properties page
Click Select on the Native RDP File
Download the RDP File
On the right side menu select the Download RDP File under section 3 Open the VM on your local machine
Open the downloaded RDP file Select Connect on the pop-up Put the username and password that you created in the VM setup (if you are on a windows machine you may need to click \"More Choices\" before logging in) You may get a certificate warning, you can ignore that "},{"location":"exercises/exercise_windows_filestorage/#fifth-step-map-the-azure-file-share-to-a-windows-drive","title":"Fifth Step: Map the Azure File Share to a Windows Drive","text":" In the Azure portal, navigate to your qsfileshare and select Connect Click Show Script in the right-hand menu pop-up This will display a script in the same menu. Copy and paste this script into your notepad. Go back to your VM Open Windows Powershell Paste in the contents of your notepad Press Enter You will see \"Credential added successfully\" when it works "},{"location":"exercises/exercise_windows_filestorage/#sixth-step-working-with-snapshots","title":"Sixth Step: Working with Snapshots","text":" Create a Share Snapshot
Adding a snapshot in the Azure Portal In the Azure Portal, navigate to the file share Select Snapshots (located in the left hand side menu) Select + Add a Snapshot and click Ok In your VM, open the qstestfile.txt and type \"This file has been modified\". Save and close the file. Create another snapshot (repeat steps a-c) Browse a Share Snapshot
On your file share, select Snapshots Select the first Snapshot in the list Select qsTestFile.txt Restore from a Snapshot
Ensure you're in the file share Snapshot tab Right click the qsTestFile Select Restore Select Overwrite Original File and click Ok Open the file in the VM. It should be restored and have no text in it. Delete a Share Snapshot
On your file share Snapshot list, select the last snapshot in the list Select Delete Use a Share Snapshot in Windows
You can view snapshots from your mounted Azure file share by using the Previous Versions tab In your VM File Explorer, locate the mounted share. It should be titled qsfileshare and have a boxy symbol Select qstestfile.txt and Right Click Select Properties from menu Select Previous Versions - this shows you a list of previous snapshots Select Open Restore from a Previous Version
In the same screen we were just in, rather than selecting \"Open\", select \"Restore\" "},{"location":"exercises/exercise_windows_filestorage/#seventh-step-delete-the-resources","title":"Seventh Step: Delete the Resources","text":" Click on your Resource Group Select everything except the storage account you created in Session 1 Select Delete NOT \"Delete Resource Group\" to delete the resources Go to your storage account and delete the fileshare as well "},{"location":"exercises/storage_pricing_exercise/","title":"None","text":"Prior to doing this exercise, See the reading and lecture slides for Cloud Storage for definitions of terms.
How large, approximately, is your data? If you are unsure, estimate 100 gb. How much would it cost to keep it in the cloud?
Compare the pricing for Blob, Files and Disk storage for 6 months
Aspects Of Storage:
Redunancy: Always slect \"LRS\" as that is almost always sufficient and for con Storage prices are not the same across regions, but the default (\"East US\") works for this exercise Consider only the \"Hot\" storage of the different tiers (\"Premium\", \"Hot\", \"Cool\", and \"Archive\") for some high performance applications, Premium is required, but look at the price difference! Operations, Transactions and data transfer costs charged per 10K operations really hard to estimate unless you know your workload very low costs, e.g. reading 10K Blobs costs 1/2 of one cent. I would not bother estimating this cost unless you know you will have very high disk operations Types of Storage to Compare:
Azure Blob Pricing: https://azure.microsoft.com/en-us/pricing/details/storage/blobs/ select \"Hierachcial namespace\"
Azure Files Pricing: https://azure.microsoft.com/en-us/pricing/details/storage/files/
Managed Disk Pricing : https://azure.microsoft.com/en-us/pricing/details/managed-disks/
note these are in different sizes and types, select 128gb size if you are estimating 100gb data, Standard SSD when you create a disk in the protal, it defaults to 1 TiB size, which is quite expensive / month "},{"location":"exercises/storage_pricing_exercise/#optional-compare-with-on-premise-storage-costs","title":"Optional: compare with On-premise storage costs","text":"The MSU HPC offers 1TB storage with redundant backups and high-speed access for free, with each additional 1TB for $125/year. Since this is network attached storaage is this comparable to Azure Files or Azure Blob storage?
If you need 2TB storage ( 1 free + 1 paid), what is the approximate Azure cost for 2000gb for 12 months, ignoring all operatinal costs (just storage)?
"},{"location":"references/","title":"Cloud Computing References and Links to Azure Documentation","text":""},{"location":"references/#cloud-computing-for-research","title":"Cloud Computing for Research","text":"\"Cloud Computing for Science and Engineering\", Foster and Gannon
Chapter 1: Orienting in the cloud universe ( Alternative link to publisher preview chapter ) Using Cloud Computing for Academic Research, Mahmoud Parvizi, unpublished draft, 2021.
Several additional resources for learning about cloud from Cloudbank, a west-coast consortium to help researchers use cloud computing: https://cloudbank-project.github.io/cb-resources/
Very in-depth case study of cloud for simulations (climate models): \\ Cloud Computing for Climate Modelling: Evaluation, Challenges and Benefits. Montes, D., et al. Computers 2020, 9(2), 52; https://doi.org/10.3390/computers9020052(2020).
"},{"location":"references/#general-cloud-computing-interest","title":"General Cloud Computing Interest","text":"Historical Note Who Coined 'Cloud Computing'? by Antonio Regalado, October 2011, MIT Technology Review
Intro to Cloud Computing from Microsoft which is primarily for IT people responsible for spending money and maintaining IT Infrastructure: MS Training Describe cloud computing
"},{"location":"references/#azure-resources","title":"Azure Resources","text":""},{"location":"references/#general-azure-references","title":"General Azure References","text":"Main Azure Documentation : https://docs.microsoft.com/en-us/azure/
List of All Azure Services : https://portal.azure.com/#allservices
Azure Tips and Tricks : https://microsoft.github.io/AzureTipsAndTricks/
Azure Portal \"How to\" series - focused on using the Azure portal to do several different things. This is mostly about the services themselves, not the portal, and many topics do not apply to us (e.g. Azure Arc) but there are some very useful videos : https://youtube.com/playlist?list=PLLasX02E8BPBKgXP4oflOL29TtqTzwhxR
These look like really good intros to Azure, but requires a time investment. The examples are not really research computing examples but may be valuable learning examples. Most of these lessons were taken from other 'learning paths' and are still oriented towards IT professionals
Microsoft Learn: - Azure for Researchers part 1: Introduction to Cloud Computing - Azure for Researchers part 2: Cloud Security and Cost Management
"},{"location":"references/#azure-books-available-to-the-msu-community-via-the-library","title":"Azure Books available to the MSU Community via the Library","text":"Search for Microsoft Azure, ordered by date
Microsoft Azure Functions: Developing Serverless Solutions Trevoir Williams, Packt Publishing 2022
Practical Azure SQL Database for Modern Developers Davide Mauri, Silvano Coriani, Anna Hoffman, Sanjay Mishra, Jovan Popovic Apress 2021.
Planning, Deploying, and Managing the Cloud Julian Soh, Marshall Copeland, Anthony Puca, Micheleen Harris. Apress 2020.
"},{"location":"references/#interface-azure-portal","title":"Interface: Azure Portal","text":"Azure Portal Documentation : https://docs.microsoft.com/en-us/azure/azure-portal/
Microsoft Azure Hierarchy: Organize your Azure resources effectively
Re-organize your portal view by creating a new dashboard (optional) : https://docs.microsoft.com/en-us/azure/azure-portal/azure-portal-dashboards
Azure portal productivity Tips : https://microsoft.github.io/AzureTipsAndTricks/blog/tip329.html#azure-portal-productivity-tips
https://microsoft.github.io/AzureTipsAndTricks/blog/tip329.html
"},{"location":"references/#azure-interface-azure-command-line","title":"Azure Interface: Azure Command Line","text":"Command-line progamming of Cloud Services
Azure PowerShell (Windows) https://docs.microsoft.com/en-us/powershell/azure/
Introduction to PowerShell : https://docs.microsoft.com/en-us/powershell/azure/get-started-azureps?view=azps-3.0.0 Azure Command Line Interface (CLI) (MacOS, Linux): https://docs.microsoft.com/en-us/cli/azure
Introduction to Azure CLI https://docs.microsoft.com/en-us/cli/azure/get-started-with-azure-cli?view=azure-cli-latest Hybrid inferface: using the CLI inside the Azure Portal You can install and use the az
CLI program on your own computer, but Azure also has a way you can use the CLI without installing anything, with a cloud-based terminal interface called the \"cloud shell.\" For an overview see https://docs.microsoft.com/en-us/azure/cloud-shell/overview and for a great 'quickstart' see https://docs.microsoft.com/en-us/azure/cloud-shell/quickstart for a quick tutorial for how to use it. In the quickstart, the first example shows you how to create a resource group using the CLI in the cloudshell. If you don't have permissions to create a new resource group, skip to the next example (\"Create a Linux VM\") and put your own resource group in the command for the -g
parameter and perhaps use a very unique name for the VM parameter.
"},{"location":"references/#azure-storage","title":"Azure Storage","text":"Create a Storage Account:
https://docs.microsoft.com/en-us/azure/storage/common/storage-quickstart-create-account
Azure Storage Explorer: https://azure.microsoft.com/en-us/features/storage-explorer/
Blob Storage Documentation: https://docs.microsoft.com/en-us/azure/storage/blobs/
Create and Manage a Storage Account: https://docs.microsoft.com/en-us/azure/storage/common/storage-quickstart-create-account
Using the CLI with Storage Reference: https://docs.microsoft.com/en-us/cli/azure/storage/account
Using PowerShell Storage Reference: https://docs.microsoft.com/en-us/powershell/module/azure.storage
Create blob storage with CLI:
https://docs.microsoft.com/en-us/azure/storage/common/storage-azure-cli
Create blob storage with PowerShell:
https://docs.microsoft.com/en-us/azure/storage/blobs/storage-quickstart-blobs-powershell
"},{"location":"references/#compute","title":"Compute","text":"Overview of Compute Options: https://docs.microsoft.com/en-us/azure/architecture/guide/technology-choices/compute-overview
Choosing an Azure Compute Service (Decision Tree): https://docs.microsoft.com/en-us/azure/architecture/guide/technology-choices/compute-decision-tree
"},{"location":"references/#interface-arm-templates","title":"Interface: ARM templates","text":"Azure Resource Manager Templates are JSON-formatted configuration files that dictate which resources to create.
See also information on 'Bicep', which is Azure's sipmlified (but still complex) template language to replace the ARM templates
Overview of ARM templates: https://docs.microsoft.com/en-us/azure/azure-resource-manager/templates/overview
explore quick start ARM templates (web): https://azure.microsoft.com/en-us/resources/templates/
explore quick start ARM templates (github): https://github.com/Azure/AzureStack-QuickStart-Templates
many of these github repositories include a \"deploy to Azure\" button that will run the template via the portal and create resources. "},{"location":"references/#r-and-azure","title":"R and Azure","text":"https://blog.revolutionanalytics.com/2018/12/azurestor.html
https://cloudblogs.microsoft.com/opensource/2019/07/01/azurer-available-create-manage-monitor-azure-services-r/
https://docs.microsoft.com/en-us/azure/architecture/data-guide/technology-choices/r-developers-guide
https://docs.microsoft.com/en-us/azure/machine-learning/studio-module-reference/r-packages-supported-by-azure-machine-learning
https://github.com/Azure/AzureContainers
https://github.com/Azure/AzureR
https://github.com/Azure/AzureRMR
"},{"location":"references/#python-and-azure","title":"Python and Azure","text":"https://azure.microsoft.com/en-us/develop/python/
https://docs.microsoft.com/en-us/azure/python/
https://github.com/Azure/azure-sdk-for-python
https://github.com/Azure/azure-storage-python
https://azure.github.io/azure-sdk/releases/latest/all/python.html (Note that pypi.org/project/azure/ is deprecated/obsolete if you find that via google)
"},{"location":"references/#matlab-and-azure","title":"MATLAB and Azure","text":"https://blogs.msdn.microsoft.com/uk_faculty_connection/2017/06/29/running-matlab-on-azure-provision-a-matlab-distributed-computing-server-using-azure-vms/
https://github.com/mathworks-ref-arch/matlab-on-azure
https://www.itcentralstation.com/products/comparisons/mathworks-matlab_vs_microsoft-azure-machine-learning-studio
https://www.mathworks.com/solutions/cloud.html
"},{"location":"references/#microsoft-azure-cosmos-db","title":"Microsoft Azure Cosmos DB","text":"CosmosDB is a very large scale data system that can act like other database systems including SQL, MongoDB (a popular no-sql database), and others. It's advantage is that it can handle extremely large data sets (65tB) but is easy to get started. Google and AWS have similar offereings ( \"BigQuery\" and \"Aurora\" respectively).
If your data is not large, consider using SQL data systems which are also very widely used (and can be used on your own computer)
Intro: https://docs.microsoft.com/en-us/azure/cosmos-db/introduction
It can be free to use, but you have to turn that on when creating the service for your account: https://docs.microsoft.com/en-us/azure/cosmos-db/free-tier
You can run a notebook inside the databaase to queery data with python :
Notebook Description: https://docs.microsoft.com/en-us/azure/cosmos-db/cosmosdb-jupyter-notebooks Service announcement: https://azure.microsoft.com/en-us/blog/analyze-and-visualize-your-data-with-azure-cosmos-db-notebooks/ Video: https://www.youtube.com/watch?v=OrnZMkP5Eq4&list=PLLasX02E8BPBKgXP4oflOL29TtqTzwhxR&index=7 "},{"location":"references/#cloud-architecture","title":"Cloud Architecture","text":"This section has resources for advanced to intermediate cloud users who are interested in much more details that most researchers will ever need, and are really geared for IT staff. However, sometimes to find insight into how to approach your problem (especially for cloud timing ooptimazation projects) these may have useful sections.
Microsoft Azure Infrastructure Services for Architects by John Savill, Oct 2019, available from the MSU Library : http://catalog.lib.msu.edu/record=b13538669~S39
Azure has changed since 2019 but may still be relevant
"},{"location":"sessions/01_introduction/","title":"Introducing the MSU Cloud Computing Fellowship","text":"You don't have to face the clouds alone"},{"location":"sessions/01_introduction/#welcome","title":"Welcome!","text":"This is the first 'session' of the MSU Cloud Computing Fellowship (CCF) for 2022-2023. For a description of the program and how sessions are organized, see the CCF home page
The goals of this introductory session are to orient you to this program, introduce ourselves to each other, provide some background on cloud computing, set up our technology, and discuss what all of our expectations are.
"},{"location":"sessions/01_introduction/#activities","title":"Activities:","text":"Introduce yourself on Microsoft Teams
You should have all been given access to a Team \"MSU ICER Cloud Computing Fellowship\" via your NetID.
Please log in to Teams (via the web https://teams.microsoft.com/ or using the Teams client) Post a new message in the \"general\" channel just saying \"hello\" and include your name, department and how you prefer to be addressed. If necessary MSU IT has documentation about MS Teams here: https://tech.msu.edu/technology/collaborative-tools/spartan365/ ( the link on that page requires yet another MSU log-in) Confirm Access to Azure Portal
Go to https://portal.azure.com. Log in with your MSU netid and password. Ensure you can access the Azure main web \"portal.\" You don't need to (and shouldn't) create any new resources or work with this website; simply confirm you have access. You may see a list of \"resources\" and will introduce Azure during our first meeting. "},{"location":"sessions/01_introduction/#introductions","title":"Introductions","text":""},{"location":"sessions/01_introduction/#msu-cloud-computing-fellowship-team","title":"MSU Cloud Computing Fellowship Team","text":" Dr. Brian O'Shea, Director, MSU Institute for Cyber-Enabled Research (ICER), Professor, Physics Dr. Mahmoud Parvizi, Co-Instructor Research Consultant and Software Engineer, Institute for Cyber Enabled Research Participant in first Fellowship cohort Manager of ICER Training Patrick Bills, Co-Instructor Research Software Engineer, Institute for Cyber Enabled Research. Brad Fears, Contributor, MSU IT Services Research Cyber Infrastructure (IT RCI) IT support staff with certification in AWS and Azure Sponsored by ICER, MSU Office of Research and Innovation (ORI), and MSU IT Services Research Cyberinfrastructure (RCI)
"},{"location":"sessions/01_introduction/#participant-introductions-discussion","title":"Participant Introductions & Discussion","text":" About you: your preferred name and pronouns, which degree program or department if faculty. 2 minute research synopsis and methods Previous experience with reseach computing including cloud computing (if any) Current research computing hurdles, roadblocks, challenges & triumphs Which aspect of cloud computing are you most interested in learning and using to support your research? "},{"location":"sessions/01_introduction/#fellowship-program-overview","title":"Fellowship Program Overview","text":""},{"location":"sessions/01_introduction/#fellowship-goals","title":"Fellowship Goals","text":"Help you get an understanding of:
what is cloud computing? what is cloud computing useful for? when should it use it for my research computing? how can I use it? Understanding of the context of the technology we are learning about. Help you get some practical experience
apply cloud to some aspect of your own research apply cloud to generic/canned research-like problem Fellowship - Learn from and support your fellow researchers
Non-Goals: - cover all aspects of cloud - we don't cover networks for example due to time constraints - prepare you for a cloud computing certification (there are many existing resources for that) - become experts in everything cloud - build a dot-com empire
"},{"location":"sessions/01_introduction/#program-overview","title":"Program Overview","text":"The syllabus\" is the home page of this website and has a detailed schedule. Keep an eye on the home page for updates!
Fall semester: Workshops (Pat Bills): Schedule and expectation; website structure, session materials and activities, readings); in-person meeting approx bi-weekly and excluding holidays; our expectations. Winter/Spring semester: Projects (Mahmoud Parvizi): Goals, schedule and expectation; Proposal write-up due early January, and presentations during semseter; Check-points to discuss progress and hurdles On-going help Final presentation during Symposium late april "},{"location":"sessions/01_introduction/#introduction-to-cloud-computing","title":"Introduction to Cloud Computing","text":" The Computing in Cloud Computing Aspects of Cloud Computing Azure Organization Learning how to learn about cloud "},{"location":"sessions/01_introduction/#hands-on-using-the-azure-portal","title":"Hands-on: Using the Azure Portal","text":" Interacting with Azure using the Portal web interface Setting a Budget Alert Using the Azure Portal "},{"location":"sessions/01_introduction/#questions-and-discussion","title":"Questions and Discussion","text":" What things are at the top of your mind as you begin this program? Which of these topics resonates with your previous experience using computing or cloud computing (if any)? "},{"location":"sessions/01_introduction/#follow-up-activity","title":"Follow up Activity","text":"Please complete the following prior to our next meeting in 2 weeks:
Read about Azure Organization see the topics above and the in-depth readings below to give you more context as you learn Complete the exercise Create a Budget Alert so that you may be notified if you spend more money than you plan to. *This first part requires significant learning, and the more you do know the better choices you can make when developing your project. *
"},{"location":"sessions/01_introduction/#bonus-activity","title":"Bonus Activity","text":"If you are familiar with the command line, Azure offers a web-based terminal/shell with many applications pre-installed. once you have a storage account, you can create a special 'cloud shell' account. We will cover various interfaces to the cloud next time.
Overview of the Azure Cloud Shell Start and use the Azure Cloud Shell "},{"location":"sessions/01_introduction/#readings","title":"Readings","text":" Wikipedia article on cloud computing is actually pretty good Chapter 1: Orienting in the cloud universe from \"Cloud Computing for Science and Engineering\", Foster and Gannon ( Alternative link to publisher preview chapter ) Using Cloud Computing for Academic Research, Mahmoud Parvizi (draft version). The NIST definition of cloud computing ## Optional Readings - Optional Historical Note Who Coined 'Cloud Computing'? by Antonio Regalado, October 2011, MIT Technology Review
Optional [M. Armbrust et al. \"Above the Clouds: A Berkeley View of Cloud Computing. Technical Report UCB/EECS-2009-28 \"University of California at Berkeley, Electrical Engineering and Computer Sciences, 2009 PDF](https://www2.eecs.berkeley.edu/Pubs/TechRpts/2009/EECS-2009-28.pdf} Written only 3 years after the launch of AWS, this is very insightful discussion of the value of cloud computing "},{"location":"sessions/02_how_to_cloud/","title":"Session 2: What is the cloud and how does it work? An introduction using storage and virtual machines","text":""},{"location":"sessions/02_how_to_cloud/#about-this-session","title":"About this Session","text":"We are providing materials and activities for this session for you to read and attempt at your own pace. Please attempt these and see how far you can get. Feel free to post on Microsoft Teams if you have any issues, find things that need correcting, or have general questions.
We will host an optional, additional, in-person session to provide help to anyone who wants to attend, Friday September 23 2pm to 3:30p. Since this is outside of pre-arranged schedule, anyone who would like help but can't attend during this please contact us and we will arrange a time for you.
We will discuss all of this material and more during our next regularly scheduled in-person session Friday September 30th.
"},{"location":"sessions/02_how_to_cloud/#overview","title":"Overview","text":" When many people think of \"cloud computing\" they think of computers in the cloud, or virtual machines. Cloud computing companies offer much more than just virtualized hardware, but this is a good place to start. This session is designed to be a hands-on workshop where we walk-through creating the resources needed for to run a computer in the cloud, logging into this computer, copying data and using that data in a program. At the end of the session you should have a good introduction of what it means to \"cloud compute.\"
"},{"location":"sessions/02_how_to_cloud/#overview-presentation","title":"Overview Presentation","text":"Cloud Concepts & Virtualization Slides (PDF)
"},{"location":"sessions/02_how_to_cloud/#about-the-azure-portal","title":"About the Azure Portal","text":"We were introduced to the portal in the first session. The following dives into more detail about 'resource groups' which is the core of how Azure is organized. Note that, as we get started, fellows have access to just a single resource group that we've created for you. You can't create your own but you can create as many resources as yuo need inside this single resource group.
Top-down description of how Azure is organized From Session 1 Using the Azure Portal : tutorial and video "},{"location":"sessions/02_how_to_cloud/#optional-follow-ons","title":"Optional Follow-ons:","text":"Azure Storage
The Activity above had you creat a 'storage account' with no background.
You will see there are different types of storage, but all types must be inside a \"storage account\" and this \"storage account\" must be inside a resource group.
We will re-visit concepts and usage of cloud storage in detail, as it's a core aspect of cloud computing.
"},{"location":"sessions/02_how_to_cloud/#virtual-machines","title":"Virtual Machines","text":"We introduced \"virtualization\" during our introduction. For IT this means flexibly creating multiple resources on one piece of hardware using software. The main use case is many virtual computers (or servers) on one large computer hardware. This was create prior to cloud, but when you create your own computer in the cloud, it's based on the technology. To a user it may seem very similar, but to the systems IT engineer, it's very different. However these readings may help give you an
"},{"location":"sessions/02_how_to_cloud/#readings","title":"Readings:","text":" Chapter 4: Computing as a Service from \"Cloud Computing for Science and Engineering\", Ian Foster and Dennis B. Gannon, September 2017 What is a Virtual Machine (VM)? Introduction from Microsoft What is a Virtual Server? Youtube Video from IBM describing how companies (including MSU!) use virtualization to run multiple computers on one server to optimize the use of space in a data center. What's the difference between cloud and virtualization? from RedHat, a Linux Operating system company "},{"location":"sessions/02_how_to_cloud/#activity-create-a-virtual-machine-with-azure","title":"Activity: create a virtual machine with Azure","text":"Create (and delete) a Virtual Machine with the Azure Portal for both windows and Linux.
"},{"location":"sessions/02_how_to_cloud/#discussion-why-create-a-vm","title":"Discussion: Why create a VM?","text":"What is a VM good for? The activity above does not discuss why you'd create a VM and connect with remote desktop, only that you can do it. We will discuss that at our next session. Can you think of possible use cases for your research, or other types of research, for a remote computer that could be very powerful or very small?
"},{"location":"sessions/03_cloud_storage/","title":"Session 3: Cloud Storage","text":""},{"location":"sessions/03_cloud_storage/#introduction","title":"Introduction","text":"Central to using cloud for nearly all services is storing data. Cloud storage is quite different from what most are used to related to saving a file to your disk or USB removable media or even our HPC. During the previous workshop we created a VM but didn't use cloud storage, we simply create a VM \"virtual disk\" that is attached to the VM just like your hard drive is attached to your own computer. However there are disadvantages to this : 1. the main OS disk is typically deleted when the VM is deleted, although you can create a 'durable' disk to share 1. the data on the main OS disk is tied to that Virtual Machine and hence that operating system, that is, it's typically inaccessible from other cloud services 1. it is limited in size and scope The largest of virtual disks are around 1 TB. Azure Cloud storage accounts are limited to 5 TB and you may have multiple storage accounts. 1. You can only move data to/from a virtual or shared disk storage using a virtual machine 1. Most importantly virtual disks very expensive compared to cloud storage
Cloud storage was engineered to save millions of files for millions of users and will take some changes to your approach to understanding how it works.
"},{"location":"sessions/03_cloud_storage/#activities","title":"Activities","text":" Download and install the Azure Cloud Storage Explorer See the \"Download now\" button at the top of that page. You may review the content of the page
We did this in the first session, but if you want to work through this again for complete exercises in Creating Azure Cloud Storage Accounts to create and use storage.
Exercise: Azure Storage Pricing
"},{"location":"sessions/03_cloud_storage/#readings","title":"Readings","text":" Azure Cloud Storage for Researchers (Slides)
Not a bad, high-level introduction : Edureka Azure Storage Tutorial (there are several pop-ups and ads, but it's a good level of of information )
Storage as a Service from \"Cloud Computing for Science and Engineering\" Azure Documentation: Introduction to the core Azure Storage services Table of Azure Storage Product Offerings Optional: this is long (It says 46 minutes but it will probably take less time) but a good basic introduction to Azure storage: Azure Training: Explore Azure Storage services ( free training from Microsoft Learn) optional Understanding block blobs, append blobs, and page blobs
Introduction to Azure managed disks This has more technical background than necessary but could be very helpful.
"},{"location":"sessions/03_cloud_storage/#post-session-discussion-points","title":"Post-session discussion points","text":"There are several options when creating a storage account. For example, what is the difference LRS vs GRS? Is the documentation describing these clear or confusing? What conditions might you consider LRS vs GRS? Is it worth the cost?
How would you share data with colleagues outside of MSU using cloud storage? Where did you find the information for how to do that (Microsoft, Azure, Blog post, other)? Let's say need to share 5gb of data. After doing the pricing exercise above just for storage, what are the costs for each upload and download of 5gb? Does it make a difference if it's Blob or File storage?
"},{"location":"sessions/03_cloud_storage/#activities_1","title":"Activities:","text":"The following two activities walk through attaching Azure files to a VM so you can use it just like any other disk. This is only one method for moving data to/from cloud storage to your VM, but it does not require changing your program code.
For Windows Users: Using File Storage with Windows VM
Create an SMB Azure file share and connect it to a Windows VM using the Azure portal
For Linux Users: Mounting File Storage with Linux VMs using NFS
Microsoft Tutorial: Create an NFS Azure file share and mount it on a Linux VM using the Azure portal
How to mount Azure Files on Linux using SMB
Notes: - SMB (invented by Microsoft for Windows) and NFS (invented by Sun Microsystems from Unix) are competing methods for attaching network storage. Both were created for on-premise servers, but Azure Files storage brings this to the cloud. - this tutorial uses command line, and requires an ssh connection to the VM you create. - Knowledge of Linux systems (mount points, fstab, etc) required
Optional: Python And Blob Storage
This describes an a different method for moving files to/from cloud storage: using code. This does not require you to 'mount' the storage to your VM.
For Intermediate Python users, and if you have time and interest, consider this tutorial from Azure: Quickstart: Manage blobs with Python v12 SDK
Requirements:
knowledge of Python use the blob storage account you created in the exercise above or createa a new one familiarity with Azure portal Python installed on your computer (suggest python 3.6 minimal) familiarity with the terminal and command line **Optional: Using Managed Disks with Linux
Azure Learning Tutorial : Add and size disks in Azure virtual machines
Notes: - Uses the Azure Command line interface which we have not discussed. For
"},{"location":"topics/","title":"Short Topics for the Cloud Computing Fellowship","text":"These topics are introduced in the sessions in the syllabus. This is an index of all the topics here to help you find them outside of lessons. They are not in any particular order, but aggregated here in an effort to help you find them.
The Computing in Cloud Computing Aspects/Nature of Cloud Computing How to Cloud with Azure (pdf, slide format) Azure Organization Learning how to learn about cloud Using Tags in Azure to organize, identify and find resources Azure Costs Basics Azure Cloud Storage for Researchers (slides format) Introduction to cloud interfaces (web, REST API, Python, command line, Javascript, etc) "},{"location":"topics/azure_cloud_cost_basics/","title":"Intro to Cloud Costs on Azure","text":"You've heard us say that nearly everything Azure has a cost, but how can you tell how much?
First, Cautionary Tale: Google Cloud Charged Me $1000 For This Mistake by Kunal Vaidya on Medium. *tl;dr: he forgot to turn off a service even though he was no longer using it. Good news it, Google does grant 1-time forgiveness if you can prove you are using the service to learn about it (e.g. you are student). *
"},{"location":"topics/azure_cloud_cost_basics/#video-walk-through-of-azure-cost-analysis","title":"Video Walk-through of Azure Cost Analysis","text":"The following video walks through how to use the costs analysis features of the Azure portal for your resource group. 1) It helps to understand Azure Organization, and 2) it is from a few years ago so the screens may look a little different
Short video (3:30) Demonstrating Azure Portal Cost Analysis, on MSU MediaSpace (log-in required)
"},{"location":"topics/azure_cloud_cost_basics/#details-about-costs-in-azure","title":"Details about Costs in Azure","text":"The content below assumes you have knowledge of how to use the Azure Portal, basic cloud operations, what a virtual machine is. See the links and materials for session 01 for the necessary background.
"},{"location":"topics/azure_cloud_cost_basics/#1-pricing-pages","title":"1. Pricing Pages.","text":"All cloud vendors have pricing pages that describe how they meter and charge for services. For Azure this is https://azure.microsoft.com/en-us/pricing/#product-pricing
However I usually find the page I need quickly by simply googling azure <service name> pricing
for example I wanted to see how much a static IP address costs in azure so googling 'azure static ip pricing' takes me to https://azure.microsoft.com/en-us/pricing/details/ip-addresses/
Some of these pages are straightforward, but like the one above has addition knowledge. What does this mean in practice? For example, what does \"classic\" vs \"ARM\" even means? There is a link at the top of the page but this may take time to read and understand. I'll tell you that we will never use 'classic' and only use 'resource manager (ARM).' so look at the ARM Prices.
This kind of background info is very common for services.
"},{"location":"topics/azure_cloud_cost_basics/#2-build-something-and-check-the-cost","title":"2. Build something and check the cost","text":"The other option is the empircal method: build something, use it, review the costs, and estimate.
At the resource group in the protal ( see Azure Organization), there is a link on the left-side menu, near the bottom labelled \"Cost Analysis\" - click that
This is a live report of your current costs, with the ability to filter by time period, resource type, tags, and other things.
Near the middle are rouded buttons controlling the view you see. At the right side of this is a button \"Add Filter\" which you can click to show costs only for some resources. For example if you click that and select \"Service Name\" and then \"virtual machines\" you will see the costs for the current month.
A powerful filtering technique is to use tagging in Azure, which is akin to adding meta-data to resources. See the Cloud Glossary
In many of the filtering mechanisms in Azure (including costs), the tag names (keys) use use are listed in the options for filtering.
Carefully select the date range for which you want an estimate, especially if your trial run started a few days ago in the previous month as the default is a monthly estimate. Use a custom date range for the time period that makes sense for the costs you want to observe.
Example Azure Cost Analysis Screen, filtered by Tag. Click for larger view
"},{"location":"topics/azure_cloud_cost_basics/#3-pricing-calculators","title":"3. Pricing Calculators","text":"All the cloud companies have pricing calculators and they may be good for very rough estimates but I always multiple by 1.2 as I'm sure I missed some crucial resource that I didn't know I needed or didn't know costs money.
For Azure it's https://azure.microsoft.com/en-us/pricing/calculator/
"},{"location":"topics/azure_cloud_cost_basics/#summary-and-other-notes","title":"Summary and other notes","text":"Combining these three methods is how we can estimate costs.
Notes:
Pricing often depends on the location or region you select. Most regions in the US are the same price.
Data transfer costs are really hard to estimate. Transfer into the cloud (Ingress) is often free but out of the cloud (egress) usually has a charge. This is because companies with web products *(e.g. websites, web stores, image sites, etc) make money when customers view their pages (more customers => more costs =>but more revenue). However note that MSU has a deal with Azure and data transfer from Azure to MSU is (mostly) free. One way to mitigate data transfer costs in Research is to transfer large data inputs into azure, but only take out the smaller output (results, summaries).
"},{"location":"topics/azure_cloud_cost_basics/#azure-pricing-resources","title":"Azure Pricing Resources","text":"Quickstart: Explore and analyze costs with cost analysis
Video from John Saville on cost estimation including the pricing calculator: Master the Azure Pricing Calculator Jun 17, 2021
"},{"location":"topics/azure_organization/","title":"Azure Organization","text":"This is a brief description of how Azure cloud services are organized for those just getting started with Azure. It's my own take on this topic written with researchers in mind. However it should not replace Azure official documentation. The link below has a great summary of how it's setup. However you may ignore all the other sections in the \"Azure setup guide\" as this is geared for IT professionals adoption cloud for their own organization
Microsoft Azure Documentation: Organize your Azure resources effectively
Azure is organized by directories of user accounts and subscriptions. All resources must be created in exactly one \"subscription\" which is a method for billing and for setting permissions. Your organizations \"directory\" is where your user account lives, but you may have access to multiple subscreiptions with one user. MSU created a \"Cloud Computing Fellowship\" subscription for all activities and resources for this, and we added your MSU directory accounts this subscription.
Cloud computing components are known as \"resources,\" which AWS defines as \"an entity you can work with.\" Anything you can create using a cloud interfaces is a \"resource.\"
To help with more organization, in Azure, resources belong to a resource group. Resource groups can collect resources by project which could still have hundreds or just a few resources. There is no restriction and up to you to organize how it works for you. For example, a lab could have a resource group for each member, or perhaps a resource group for each project, and members collaborate on those projects.
It's also possible to restrict access to resource groups, e.g. a resource group for a project may only allow those who are working on the project access to that resource group. Azure has other organizational tools such management groups across subscriptions, complex identity management and role-based access control (RBAC) that we won't cover here.
However, this is mostly for organization and resources may be accessed from one resource group to another, and even across subscriptions. Applying this organization scheme requires practice and sometimes vigilance.
For most campuses, researchers will want to have their IT department create the subscriptions and billing as they often can get discounted prices or fee waivers. When your research group is ready to pay for services here at MSU, see the link to the \"cloud services request form\" on https://tech.msu.edu/network/cloud-services/
Summary of top-down Azure Organization:
Directory : (MSU account). All account must come from a directory (but an account can be multiple directories) Management groups : we won't use these, for admins to manage multiple subscriptions) Subscription : tied to a billing account, and where all resources are created. Resource Group : organizational tool for resources. Think of it as a \"folder\" in your file system Resource : any cloud entity you may work with (e.g. create, configure, destroy) Finally, it is possible to log-in to the Azure portal (e.g. your MSU account) and not have a valid subscription and not be able to create or access any resources. If you have never used Azure before, you may be asked to create a free trial. If are a you need to use Azure (e.g. for training) and do not have access to an MSU subscription, you may want to use a non-MSU email address and create your own account.
Azure \"tags\" add added to resources (including resource groups) and are a way to identify and locate resources by search as for many other services. They are optional but highly recommended to use a tagging scheme to help organize your resources and for cost analysis. You can use any keys and any values you find useful.
"},{"location":"topics/azure_organization/#azure-locations-or-regions","title":"Azure Locations or Regions","text":"Subscriptions are for accounting only and don't represent concrete cloud resources. However cloud resource must reside in computer somewhere, and hence have a location. Locations for cloud providers for can be thought of inside one of their massive data centers. In Azure, \"region\" and \"location\" are used interchangably (some interfaces use 'location, some use 'region')
Resources and Resource groups must be assigned a location when you create them. considerations are 1) does the location actually provide the services you need (not all locations have all cutting edge products) and 2) is the location close to you to reduce time it takes for data to cross the internet to/from you and finally 3) is there some restriction based on your country of origin.
Most of the time, simply choose the default which is East US which almost always has the latest features. For some advantage for data transfer, choose (US North Central US). However as a rule select a location/region and use that across all of your resources so that, for example, your data files in storage are close to (in the same data center as) a computer you may create.
Regions become very important for companies that offer services around the world and want to reduce the connection time for their customers. It's also possible to have back-ups of resources in different region to protect against natural disasters.
"},{"location":"topics/azure_tags/","title":"Azure tags","text":""},{"location":"topics/azure_tags/#using-tags-to-organize-resources-in-azure","title":"Using Tags to organize resources in Azure","text":"Tags are notes to yourself about the resource, use them for metadata.
As the number of cloud resources blossom (e.g. cloud sprawl) it can be important to find related resources quickly. The azure portal has a way to see resource within and across resource groups using different filtering methods. One of those is the with resource meta-data, and you can add meta data using 'tags.'
In my group we always have a tag with the key \"created by\" and value the netid of the creator. This may be redundant here becuase all the resources you create will be a in resource group with your NetID already in it, but add this for practice.
You may consider using a tag like \"project\" with value for the project if either 1) a project may have multiple resource goups or 2) a resource group would have multiple projects.
For now you have only one resource group, but tags are also used to find things across different resource groups, e.g. if by project name.
Tags can be added and removed at will from resources without altering the resource, so add as many tags as you want when starting to see how they may work.
"},{"location":"topics/azure_tags/#example-usage","title":"Example usage:","text":"When creating resources using the wizard, many resources are created at once. For example creating a virtual machine may create 12 resources. Adding a tagl to ID those resources together can really help to delet them.
use the Portal to create a test virtual machine (VM), which creates 12 resources add a unique tag to those during the VM creation process, e.g. tag \"id\" = \"test VM Oct 1\" when you later need to delete the VM becuase you are done with it, or it wasn't what you needed, you can filter resources in your group on this this so you can select those 12 resources, and not any others, without having to hunt for them by name. "},{"location":"topics/intro_aspects_of_cloud_computing/","title":"Nature of Cloud Computing","text":""},{"location":"topics/intro_aspects_of_cloud_computing/#some-motivation-at-amazoncom","title":"Some Motivation at Amazon.com","text":" Massive IT infrastructure supports the Amazon store and company They wanted to sell shopping application as a service to a company like Target who didn't want to r-un their own store. T This required the software developers to have lots of flexible infrastructure (servers) to run on. They found team to build a service (with software) could spend 70% of their time setting up the 'back end' They called all the infrastructure needed to run a massive dot-com \"muck\" and saw this as a secondary supporting role to application development. What they wanted in days actually took months. "},{"location":"topics/intro_aspects_of_cloud_computing/#eureka-moment-for-amazon-we-could-sell-it","title":"Eureka moment for Amazon: we could sell it","text":" Amazon automated their IT department so teams could order and provision the servers they needed on demand beyond just virtualization (\"everything was an API\") They got really good at running very large data centers for many customers as cheaply as possible and on-demand for Amazon.com and other stores and services. They realized that their innovations would help any IT organization and especially internet start-ups like themselves, and that they could sell it. Their customers were other IT departments Blog Post from 2006: \"We Build Muck, So You Don\u2019t Have To\" "},{"location":"topics/intro_aspects_of_cloud_computing/#nist-defintion-of-cloud","title":"NIST defintion of cloud","text":"Government offices interested in purchasing cloud computing needed a definition of it to differentiate from other kinds of computing, hence... the NIST definition of cloud computing essential characteristics
On-demand self-service. Measured service: pay for what you get. Broad network access: accessible from the internet Rapid elasticity: no limits from a customer perspective. This word was invented by AWS Resource pooling: single resources serve many customers. "},{"location":"topics/intro_aspects_of_cloud_computing/#what-is-cloud-computing-cloud-concepts-vs-cloud-providers","title":"What is Cloud Computing? Cloud concepts vs Cloud Providers","text":" Three major cloud providers are in a constant arms race, literally (Azure vs. Amazon competed for a $10B defense contract): Azure, Amazon Web Services and Google Cloud Platform
Offerings are very similar so all are great choices
other options, smaller companies, open source options (used by Indiana University JetSteam HPC, Osiris project from MSU, UMich, Wayne State and IU. Cyverse for running jobs. "},{"location":"topics/intro_aspects_of_cloud_computing/#benefits-of-cloud-computing-for-research","title":"Benefits of Cloud Computing for Research","text":" Customized Computing: can create customized resources only when you need it Elastic/On-demand: can run ad-hoc computations on those on-demand resources Instant service: Reproducible: a computation can be re-run as needed, meaning cloud resources can be easily re-recreated to re-run your computations. Cost effective: unlike commerical applications, more users does not mean more revenue. Budgets are fixed and the pay-as-you-go model requires vigilance to not over-spend. Others? Restatement of goals of this Cloud Computing Fellowship:
Learn which types of computing resources are beneficial to your research Learn how to use Cloud to create those resources Use the services packaged by cloud companies to discover new resources "},{"location":"topics/intro_aspects_of_cloud_computing/#using-workflow-and-computational-thinking","title":"Using workflow and computational thinking","text":" Karl Popper stated that \"non-reproducible single occurrences are of no significance to science\" ( K Popper, \"The Logic of Scientific Discovery\", English translation from Routledge, London, 1992, p. 66.) and this is a significant issue for research based on computing. To enhance reproducibility in your own work, consider documenting all the steps needed for create the environment to run your computation. For many on-premise academic systems (e.g. the MSU HPCC), we depend upon the system administrators to create that environment, but we may install and configure all the software we need to run our code. Workflow thinking can apply to the scienfic domain itself (e.g. \"Principles for data analysis workflows\" https://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1008770 ) and to the provisioning of the cloud computing environment. That is, we may use a workflow system for creating all the cloud stuff we need, and then a different workflow system that runs on that cloud stuff. One example is we may create an HPC system on Azure using templates and then launch the Slurm scheduler on that HPC to run our jobs. (note the complexity of running your own HPC is beyond the scope of this fellowship and used as an example only)
A major advantage to using workflows or code for provisioning your cloud computing components is that you can turn them off and delete them when you are done, and restart when needed.
Our first uses of cloud will use forms to create resources, but we encourage you to automation where possible.
"},{"location":"topics/intro_aspects_of_cloud_computing/#about-cloud-security","title":"About Cloud Security","text":"Security and Risk management are important issues even for researchers who's data are open - If your computer is a server, your responsibility just increased 100X: these are prime targets. Consider each component of a server to be a point of vulnerability. - Finding a readable list of security recommendations for cloud computing is a challenge for all the reasons outlined above. Our textbook has a nice chaper outlining cloud security - We will cover methods to reduce security risks but it's important to consider the risk of hacking from the beginning
Attackers may use the services you create to launch attacks on other services, leaving you liable.
The \"Shared responsibility\" model for cloud computing takes a model of computing components, and shows how much of each component the user is responsible for security. Microsoft Model of Shared Responsibility for Cloud Computing
We will come back to this model as we gain deeper understanding of research computing on the cloud.
"},{"location":"topics/intro_aspects_of_cloud_computing/#hpcc-vs-cloud","title":"HPCC vs Cloud","text":" Dr. Parvizi's white paper outlines the challenges of adapting HPC workflows to cloud computing. The HPC is amazing effective at running all kinds of systems at very list cost, if any, to MSU researchers, but not all are the best fit.
Many systems not designed for HPC can be adjusted to run in that environment. However, just like many workflows are difficult to port from HPC to cloud, some cloud workflows are difficult run on HPC (but never say never). Especially windows-based software.
"},{"location":"topics/intro_aspects_of_cloud_computing/#acknowledging-bias-in-access-to-cloud-computing-across-research-cultures","title":"Acknowledging bias in access to cloud computing across research cultures","text":"It's widely recognized that AI is frequently bias. For example, Azure Voice recognition did not work for a female researcher who developed voice-controlled surgery, so
However I believe there is also inherent bias in the user interfaces, design and definitions in the engineering of technology across many axes of diversity (gender, culture, background, training, creativity, etc). System Engineering is it's own discipline and Cloud computing is arcane so our goal is to reduce conceptual barriers to using this technology while you work with us.
"},{"location":"topics/intro_aspects_of_cloud_computing/#about-cloud-costs","title":"About Cloud Costs","text":" Cost management is a major hurdle for adopting CC, so we will talk about costs extensively (Almost) everything you do in Azure has a cost Costs often acrue over time, wether the resource is in use or not Deleting resources when are not using is a great way to reduce cost We want to encourage you to experiment! Using a very powerful machine for an hour may cost only $0.50 Just be aware that creating something and leaving it on will deplete your budget Solution: \"Budget Alerts\" Case Study: Computation of a machine learning model based on gene networks for inferring gene association ( https://www.geneplexus.net): a single (virtual) machine to run the ML such that users would not have to wait too long would be $650/month. However, if the computational power is provisioned only when needed, it's 5 cents/job.
"},{"location":"topics/intro_aspects_of_cloud_computing/#value-proposition-of-cloud-computing","title":"Value Proposition of Cloud Computing","text":" Costs are more than just dollars for services. Consider [Total Cost] = ( $ + Time + Risk )
[Total Time] = ( development time + wait time + compute time )
Security Risks are rarely non-significant, so factor that into cost In the Service level spectrum, the higher level \"platform\" services may have higher monetary costs but often reduce time and risk "},{"location":"topics/intro_to_cloud_interfaces/","title":"Interfacing with Cloud Services","text":"Cloud Services are by design DIY or on-demand and hence need a programming interface to create cloud resources. This is only possible becuase inside the data center, computer configuration can be done completely with code, also knows as \"Infrastructure as Code\" (IaC). Amazon's insight was that they could slap a website on top of that, put a system for tracking (metering) usage, and sell it.
All of the cloud companies as their base use a web interface, so-called REST API. Knowing the details of REST is not important but it's often the basis for all of the other style of interfaces.
Here is an example web api URL for weather forecast, with parameters for coordinates, units and format of output
https://www.7timer.info/bin/astro.php?lon=113.2&lat=23.1&ac=0&unit=metric&output=json&tzshift=0
Very few researchers would ever use the REST api directly, instead would use the web interface or even better the command line or programming language interface which achieves the same goal with less work.
In Azure, everything you could possibly create is called a \"resource:\" a machine, a data service, a single network address. The system to work with Azure resources is the \"Azure Resource Manager\" or ARM and the primary interface for the Resource Manager is their web (REST) api. You may see references to resources in documentation and that means any web doo-dad.
"},{"location":"topics/intro_to_cloud_interfaces/#summary-of-cloud-interfaces","title":"Summary of Cloud Interfaces","text":"This summary is focused on Microsoft Azure, but the other cloud companies have similar concepts. In addition to this guide, Chapter 1 of our text \"Cloud Computing for Science and Engineering\" has an excellent description and examples of these interfaces with examples from AWS. See the section of that chapter titled \"Accessing a cloud service\" in https://s3.us-east-2.amazonaws.com/a-book/Orienting.html
"},{"location":"topics/intro_to_cloud_interfaces/#graphical-web-interface","title":"Graphical Web Interface","text":"Most people want a graphical user interface, and for azure that's the \"Portal\" or https://portal.azure.com. For Google cloud it's the \"console\" and for AWS it's also called the console. See below for an introduction to using the portal. Note that the Azure portal and Google console both have web-based terminals that allow you to use the CLI directly in the web interface.
"},{"location":"topics/intro_to_cloud_interfaces/#desktop-applications","title":"Desktop Applications","text":"Azure provides some desktop applications for working with a few of the widely used cloud services :
Azure Storage Explorer: https://azure.microsoft.com/en-us/features/storage-explorer/ Can create cloud storage and upload/download data. We will use that for our session on Storage Azure Data Studio: https://docs.microsoft.com/en-us/sql/azure-data-studio/what-is-azure-data-studio?view=sql-server-ver15 Can connect to and work with data systems (such as databases ) that are on your computer, on a system on campus, or hosted in Azure "},{"location":"topics/intro_to_cloud_interfaces/#command-line","title":"Command Line","text":"For those not familar with the command line at all, see https://www.digitalocean.com/community/tutorials/an-introduction-to-the-linux-terminal for linux and for Windows Powershell see https://programminghistorian.org/en/lessons/intro-to-powershell
The command line interface is a great way to interact with cloud services because it's imperative and all options are specified in a single command. With the web interface, you may have to hunt through the user interface to find the checkbox for an option, but for command line
Azure has two command line interfaces: The \"CLI\" which is based on Linux and will work in any linux or Mac terminal (or shell script) and the \"Powershell\" interface which is for Windows Powershell users. Since Powershell has been ported to Linux and Mac and the Linux Shell and Azure CLI can also be used on Windows, so both are operating system independent but in practice, Windows users use powershell and everyone else uses the CLI. Your choice depends on the kinds of other systems you'll be working with. For example, the MSU HPC uses Linux command shell but Windows servers and other Windows services like SQLServer work well with Powershell.
"},{"location":"topics/intro_to_cloud_interfaces/#sdk-software-developer-kit","title":"SDK : Software Developer Kit","text":"A \"software developer kit\" is simply a collection of utilities, libraries/packages and documentation for a specific language to work with a specific service. All the cloud vendors have SDKs, and they all have SDKs for Python. SDK simply means you can create, delete, interact with cloud services from your program.
Why leave python or R if don't have to?
"},{"location":"topics/intro_to_cloud_interfaces/#python-sdk","title":"Python SDK","text":"All cloud vendors have SDKs to work with Python. After installing the SDK, you import the libraries and issue commands to create resources, then use those cloud resources to do work via client libraries (either Azure libraries or others). Azure has extensive documentation for using Python: https://docs.microsoft.com/en-us/azure/developer/python/?view=azure-python
Example Azure code to create cloud storage, compared with how you would see the resources in the azure portal, and similar commands using the CLI : https://docs.microsoft.com/en-us/azure/developer/python/azure-sdk-example-storage?tabs=cmd
Note that Azure also has a service \"Azure Cloud Functions\" that run python that are not the same thing as the SDK. These are 'serverless' resources (similar to AWS Lambda), which we will learn about later in the course.
Both AWS and Google Cloud have Python SDKs, and probably other vendors.
"},{"location":"topics/intro_to_cloud_interfaces/#rest","title":"REST","text":"Knowing the details of REST is not important but it's the basis for all of the other style of interfaces.
Here is an example web api URL for weather forecast, with parameters for coordinates, units and format of output
https://www.7timer.info/bin/astro.php?lon=113.2&lat=23.1&ac=0&unit=metric&output=json&tzshift=0
The parameters to the weather data fetch program are lon, lat, ac, unit, output=json, tzshift, and they are embedded in the URL itself.
This is caled a \"request,\" and using a web API often requires sending parameters not just sin the URL, but as an attachment or in the 'body' of the request. Browsers don't have an automatic way of doing that, so we use scripts (python Requests library) or special programs for testing Web APIs that can send parameters and data in the request body.
This is a good explanation of REST and part 2 describes the details.
https://medium.com/extend/what-is-rest-a-simple-explanation-for-beginners-part-1-introduction-b4a072f8740f
The Azure REST api is a an interface to the Azure Resource Manager via the web. Requests sent can get information about your resources, or create new resources, just like the portal, the command line and the SDKs. Those other interfaces typically translate to the REST API. Knowing about it may help diagnose why your method for interfacing with Azure is not working but not necesary to learn. For examples and more detail, see https://learn.microsoft.com/en-us/azure/azure-resource-manager/templates/deploy-rest
Few of us would ever use the Azure REST api directly, instead would use the web interface or even better the command line or programming language interface which achieves the same goal with less work.
"},{"location":"topics/intro_to_cloud_interfaces/#r","title":"R","text":"Unlike the other vendors, Microsoft maintains an SDK for R Users which allows you to create cloud services directly from Rstudio. See their github pages https://github.com/Azure/AzureR and excellent documentation throughout the packages.
"},{"location":"topics/intro_to_cloud_interfaces/#cloud-company-templating-frameworks","title":"Cloud company templating frameworks","text":"In addition to the \"SDKs\" for existing languages, cloud companies often have their own frameworks for using code to build (provision) infrastructure. For Azure, these are \"ARM Templates\" and for AWS it's Cloud Formation.
"},{"location":"topics/intro_to_cloud_interfaces/#azure-arm-templates","title":"Azure: ARM templates","text":"Azure has a system for submitting a template, or essentially a configuration file to the Azure Resource Manager (ARM) that dictates which cloud resources are to be created. For Azure these are JSON-formatted files that are \"declaritive\" (rather than procedural or imperative like Python). The best way to understand these is to explore the many that Microsoft posts on github, and to try them. If you do, be mindful to delete any resources you create so as not to be charged for them.
- Overview of ARM templates: https://docs.microsoft.com/en-us/azure/azure-resource-manager/templates/overview\n- Quick start ARM templates (github): https://github.com/Azure/AzureStack-QuickStart-Templates\n
You may see reference to \"Bicep\" templates. This is simplified ARM templating language that may be easier to write, debug and maintain than the JSON format of ARM templates.
"},{"location":"topics/intro_to_cloud_interfaces/#aws-cloud-formation","title":"AWS: Cloud Formation","text":"AWS also has templating language similar to Azure Resource Manager templates called cloud formation. If you are using AWS for your project, and want to automate the creation and deployment of resources, this may be a good option.
AWS Documentation:
What is AWS CloudFormation? How does AWS CloudFormation work? "},{"location":"topics/intro_to_cloud_interfaces/#third-party-programming-with-terraform","title":"Third-party programming with Terraform","text":"There are other ways to 'program the cloud' from companies outside of the big three. One widely used frame is \"Terraform\" from Hashicorp, not affiliated with any cloud company. The advantage to Terraform is that it's declarative in that you specific what you want, unlike say the Python or command line interface, where you have to create items with commands one at a time.
Terraform is used by cloud professionals becuase it's designed to keep the resources youve created running and allow you to modify them in place. If you find you are using scripting to build resources (which is great!) but your scripts are becomming combersome to maintain and your cloud architecture is complex, consider using Terraform.
- Terraform: https://www.terraform.io\n- Can work with any vendor including Azure\n- Often more readable than ARM templates, Syntax remarkably simple \n- Focus on maintaining consistent systems ( declarative) \n- Does not cover all services, but can fall back to ARM templates when necessary\n
"},{"location":"topics/intro_to_cloud_interfaces/#building-cloud-from-cloud","title":"Building Cloud from Cloud","text":"This may not be an 'interface' but is operationally similar. It's possible to use some of the above interfaces on existing cloud services, e.g. creating new cloud resources automaticaly from existing cloud resources. Your cloud architecture may need different types of resources, or parameterized resources only as needed (e.g. depending data inputs, a web-gateway for cloud on demand).
For example Azure Logic Apps can create resources when they are run (e.g. provision and start a computer) and a logic app can be triggered by events such as when a new file is created, or using a web api (e.g. REST POST command that sends data and parameters). This adds significant complexity and is only valuable for event-based systems opens up using the cloud as a big computer programming language.
"},{"location":"topics/intro_to_cloud_interfaces/#references","title":"References","text":"See our references page for curated Azure links. For AWS, see
https://aws.amazon.com/tools/ about the AWS CLI: https://aws.amazon.com/cli/ Demo Using Python Notebook with AWS: https://s3.us-east-2.amazonaws.com/a-book/s3.html "},{"location":"topics/learning_how_to_learn_about_cloud/","title":"Learning how to learn about cloud","text":""},{"location":"topics/learning_how_to_learn_about_cloud/#guidelines-for-researchers","title":"Guidelines for Researchers","text":"You may have looked at the various websites and poked around the web, and found it's just not clear at all how cloud computing may be helpful to you, even though it all sounds great. The challenge for researchers learning about cloud is that most cloud documentation for isn't written for you.
Challenges for researchers learning:
Cloud training and documentation are mostly written for IT professionals like system admins and architects, software developers, business people, and agency managers. Researchers tend to be a little of all of those things.
Requires an understanding the concepts, glossary of IT Infrastructure as cloud services are based on a model of IT so traning materials often have an embedded conceptual models of computing.
Goals of researchers are often different from IT Professionals responsible for buildomg systems used by hundreds of people or for business purposes. That can make it difficult to decipher which kind of cloud service will work best for your use case. As Dr. Parvizi writes (link to pdf), cloud is very different from using traditional research-oriented technology like workstations or HPC. There are hundreds of services to choose from but we find many researchers will reach for the conceptually straightfoward path of creating cloud computers and install what they need. Our goal for this fellowship is to provide context and background, and help you explore some of the so-called \"cloud native\" technologies like \"serverless\" systems that let you run your scripts without dealing with operating system installs. "},{"location":"topics/learning_how_to_learn_about_cloud/#what-documentation-is-available-for-researchers","title":"What documentation is available for researchers?","text":"There are general, conceptual introductions and dicussions for academics.
https://cloud4scieng.org/ Book and website from Ian Foster (U. Chicago) and Dennis Gannon (IU) , the text used for this fellowship. https://cloudmaven.github.io/documentation/ from the eScience institute, University of Washington. Unmaintained. source code https://cloudbank-project.github.io/cb-resources/ succesor to the cloudmaven? Cloudbank training videos "},{"location":"topics/learning_how_to_learn_about_cloud/#learning-how-to-learn-about-cloud-caveats-and-help","title":"Learning how to learn about cloud: Caveats and help","text":"As part of this fellowship, our goal is to help you translate documentation written for the systems and developer perspectives into a research perspective.
"},{"location":"topics/the_computing_in_cloud_computing/","title":"Helping to Understand the \"computing\" in cloud computing","text":"You come to us with a unique set of experiences with computing, with more or less experience depending on your previous needs. A challenge we have seen, for the many years we've been helping people, is understanding the context of computing in their research to understand the tools they have available.
In fact most documentation for cloud computing assumes you know the world of computing. An introduction to cloud computing from microsoft lists this Prerequisite: \"Basic familiarity with IT terms and concepts.\" It turns out 'basic' can mean a lot of things.
A core goal of the MSU Cloud Computing Fellowship is to help you connect cloud computing to your research in a meaningful way
our original question: - How can cloud computing benefit help your research?
Let's re-frame the question for this discussion: - Which kind of computing could help my research? - Can I use that kind of computing in the cloud? \\ That is, could cloud computing enable me to use computing I otherwise couldn't?
You may already have an idea of what this is, and experience with computing but many who come to us know it's valuable but are ready to learn why.
"},{"location":"topics/the_computing_in_cloud_computing/#what-is-computing-minimal-vocabulary","title":"What is computing? Minimal Vocabulary","text":"Cloud computing was invented for, and is marketed to IT systems administrators, software developers, and IT/technology managers. See the history of AWS. It is was not designed with researchers in mind. Most training and documentation Note, however, that Cloud Computing is general enough and is often marketed to researchers or 'for research.'
The primary function of cloud computing is to provide \"infrastructure\" aka the \"back-end\" or back room of a company's IT department, so we ware going to learn about that. In fact, cloud computing is frequently defined, named, and sold based on abstractions of physical components of computers and IT infrastructure. Hence learning more about IT infrastructure, or \"computing\" may be helpful understanding the context in which cloud computing is engineered. This can help you determin what you may need from cloud computing to get your research done.
Could you purchase your own infrastructure (computers, networks, disks, etc) and run it \"on-premise\" and get the same benefit as cloud computing? Or have your institution do that? Sometimes yes! The MSU HPCC is a great example when on-premise is more beneficial and cost-effective than cloud computing.
"},{"location":"topics/the_computing_in_cloud_computing/#about-computing-major-components-of-computer","title":"About Computing: Major components of computer","text":"Of course you know what is in a computer. The goal is to come to common understanding, and to frame for extension to cloud, and to find the cloud services that mimic these features.
User software (scripts, user code, etc) Base Software (programs to run scripts such as Python, Rstudio, Stata, Fluent, etc etc and/or libraries to compile code such as the gcc compiler, etc) Operating System (needed to make the computer functional) Input/Ouput (I/O, infrastructure to get data in and out of a computer, primarily network connections but also USB) Storage - external ( attached or via network or other I/O ) Storage - local disk Central Processor (CPU) & Memory (RAM) Computer Architecture (model type) Network Where is the data in this abstraction of computer infrastructure? Answer: everywhere
If you hadn't thought or known about the components of a computer, that's no mistake. Most people don't know the details of how their car operates, how to change their oil, or diff between carburetor and turbo charging?
"},{"location":"topics/the_computing_in_cloud_computing/#stack-model-of-computing","title":"\"Stack Model\" of computing.","text":"Just as in Science and the humanities, we need a model and terminology to talk about a subject. A standard IT model of a computer is a 'stack' model, where each upper layer depends upon the layers below. Most models of cloud computers build upon this simple model.
User Interface/Connection Software Operating System Computer Hardware: CPU & RAM Data Storage Network"},{"location":"topics/the_computing_in_cloud_computing/#about-computing-what-is-a-server","title":"About Computing: What is a server?","text":"Cloud computing started with, and frequently talks about \"servers,\" so we should define that.
A server is any computer running software that listens for, and responsed to, messages. For a server to be useful it should be connected to a network but it doesnt' need to be. Some terms: - The 'server' is actually the software, not the hardware. You can run a server on your laptop. - The computer that runs the software is the 'host' - A 'client' is software sends the message, and receives and interprets the response. - the protocol is the method by which you exchange messages. Now it is almost exclusively web (http) but there are many others - the form the input message can take, and the form of the message that is returned is known as the API of the server. it's the interface that you have to work with. - port: a computer may run many servers for internal and external use. Unix devised a system of numbered 'ports' (nbumber 01 to 64K), and when running a server you must tell the server which port to listen for messages. Users of most software never have to know or think about ports.
The 'Client/server' model invented in the 60s is so successful that we use servers for our daily lives and don't think about it (except when the server is down). This model of computing is important because it's at the basis for of cloud computing.
We often think of a server as a box, but in the model above, the server is in the software layer, but each of layers below provides services for that software to exchange messages with another computer. If you can abstract, virtualize or automate the layers below, it becomes much easier to provision servers than to purchase, install and configure physical hardware.
"},{"location":"topics/the_computing_in_cloud_computing/#example-server","title":"Example Server","text":"A Web server is a well known, easy to use, and very useful server to run. The terms above translate as follows:
server is any machinge (including your laptop) running a program that listens for http messages on port 80. client is a web browser message:URL which includes address, url paths, and additional parameters response: several headers including the status of the request (that we rarely see) and ultimatley the contents of the web page client interprets the code and renders the page. an alternate client could be a script, or the curl
utility https://www.amazon.com/dp/B09VXBNTJ1/ref=sr_1_93?brr=1
What is the host in this URL? What is the message? We could spend a week talking about web servers, protocols and a year about programming web server. The important thing is that there is a host computer, the 'web server' software on the host listening for requests, and the client(s) connecting to it to retrieve files.
"},{"location":"topics/the_computing_in_cloud_computing/#other-types-of-servers","title":"Other Types of Servers","text":" Database Client: special database client (not web browser) sends data commands as messages, response is tabular outputs File Servers Share files. We use Cloud file sync services, but Collaboration Email, calendaring etc Enterprise Data Systems for loading, cataloging, transforming business data Security Firewalls, Proxy, network traffic management Monitoring system health data collection, accessible via another web server Web-based services For example D2L. Many of these do not use web-based protocols or connections. They define their own protocols, either as a public standard (e.g. email) or proprietary standard (database)
"},{"location":"topics/the_computing_in_cloud_computing/#servers-and-networks","title":"Servers and Networks","text":"Networking Requirements to access a server:
the server must be on the same network as you to receive your message I can run a web server right on my laptop, but you couldn't reach it. the network is me talking to myself the more accessible the network, the more vulnerable, so partitioning is used servers that accept messages from the Internet are a major security risk network failure stops all work for everyone designing efficient, robust, and secure networks is a major resource drain Why do I think this is important? not only can you make a server (web, data, cluster, etc with cloud but everything you interact with in cloud is a server. You will see many services dedicated to networking in the cloud.
On our campus, the network is managed by the institution, and it is configured to block all incomming traffic to prevent anyone from running a server which is a security risk.
"},{"location":"topics/the_computing_in_cloud_computing/#too-much-hardware-virtualization-to-the-rescue","title":"Too much hardware? Virtualization to the rescue","text":"If you run a big IT Department that services 1000s of people, you need a lot of servers. Each server can only handle a certain amount of 'traffic.' Hence there are many methods for connecting multiple servers to act as one big server. Each physical machine requires 1) installation 10) maintainence.
IT Departments 'serve' large user communities with large amounts of infrastructure. Techniques were invented to separate the 'server' or 'network' from the hardware. Virtualization: single box with a layer of software to share among different software. Many servers could be created and managed with software on a single hardware Virtualization was a necessary conceptual and technological innovation to pave the way for cloud computing and is widely used both on-premise and in the cloud. Networks and other services followed suit: create single big computers that uses 'virtualization' software to emulate the functioning of a service, such that the clients don't know they are not working with an abstraction. Running different wires to connect different things is labor intensive. "}]}
\ No newline at end of file
diff --git a/sessions/01_introduction/index.html b/sessions/01_introduction/index.html
index 6e68114..117a117 100644
--- a/sessions/01_introduction/index.html
+++ b/sessions/01_introduction/index.html
@@ -16,7 +16,7 @@
-
+
@@ -24,7 +24,7 @@
-
+
@@ -448,6 +448,26 @@
+
+
+
+
+
+ Exercise: Using File Storage with Windows VM
+
+
+
+
+
+
+
+
+
+
+
+
+
+
diff --git a/sessions/02_how_to_cloud/index.html b/sessions/02_how_to_cloud/index.html
index 9885be6..bc71e0f 100644
--- a/sessions/02_how_to_cloud/index.html
+++ b/sessions/02_how_to_cloud/index.html
@@ -16,7 +16,7 @@
-
+
@@ -24,7 +24,7 @@
-
+
@@ -448,6 +448,26 @@
+
+
+
+
+
+ Exercise: Using File Storage with Windows VM
+
+
+
+
+
+
+
+
+
+
+
+
+
+
diff --git a/sessions/03_cloud_storage/index.html b/sessions/03_cloud_storage/index.html
index 16543e2..5d8601b 100644
--- a/sessions/03_cloud_storage/index.html
+++ b/sessions/03_cloud_storage/index.html
@@ -16,7 +16,7 @@
-
+
@@ -24,7 +24,7 @@
-
+
@@ -448,6 +448,26 @@
+
+
+
+
+
+ Exercise: Using File Storage with Windows VM
+
+
+
+
+
+
+
+
+
+
+
+
+
+
@@ -691,8 +711,8 @@
-
- Optional Activities:
+
+ Activities:
@@ -971,8 +991,8 @@
-
- Optional Activities:
+
+ Activities:
@@ -1040,13 +1060,10 @@ Readings
Post-session discussion points
There are several options when creating a storage account. For example, what is the difference LRS vs GRS? Is the documentation describing these clear or confusing? What conditions might you consider LRS vs GRS? Is it worth the cost?
How would you share data with colleagues outside of MSU using cloud storage? Where did you find the information for how to do that (Microsoft, Azure, Blog post, other)? Let's say need to share 5gb of data. After doing the pricing exercise above just for storage, what are the costs for each upload and download of 5gb? Does it make a difference if it's Blob or File storage?
-Optional Activities:
+Activities:
The following two activities walk through attaching Azure files to a VM so you can use it just like any other disk. This is only one method for moving data to/from cloud storage to your VM, but it does not require changing your program code.
For Windows Users: Using File Storage with Windows VM
-Microsoft Tutorial: Create an SMB Azure file share and connect it to a Windows VM using the Azure portal
-Notes:
-- The tutorial has you create a storage account, but you can re-use the one you've already created (and change the names), or follow the tutorial and create another one.
-- Not all versions of Windows can use this. For much more detail, see the Azure documentation page "Mount SMB Azure file share on Windows"
+Create an SMB Azure file share and connect it to a Windows VM using the Azure portal
For Linux Users: Mounting File Storage with Linux VMs using NFS
Microsoft Tutorial: Create an NFS Azure file share and mount it on a Linux VM using the Azure portal
How to mount Azure Files on Linux using SMB
diff --git a/sitemap.xml.gz b/sitemap.xml.gz
index 855af71566544ee55dceb69437b03e0fd1028155..b610fe6c377ab2d001ff0b85a481bdfc6be9f4d9 100644
GIT binary patch
delta 13
Ucmb=gXP58h;5d2CXd-(B03QzonE(I)
delta 13
Ucmb=gXP58h;Al9dHj%vo038(sKmY&$
diff --git a/topics/azure_cloud_cost_basics/index.html b/topics/azure_cloud_cost_basics/index.html
index 8b8f905..7cda9f9 100644
--- a/topics/azure_cloud_cost_basics/index.html
+++ b/topics/azure_cloud_cost_basics/index.html
@@ -16,7 +16,7 @@
-
+
@@ -24,7 +24,7 @@
-
+
@@ -448,6 +448,26 @@
+
+
+
+
+
+ Exercise: Using File Storage with Windows VM
+
+
+
+
+
+
+
+
+
+
+
+
+
+
diff --git a/topics/azure_organization/index.html b/topics/azure_organization/index.html
index 41e64fa..8e5ae48 100644
--- a/topics/azure_organization/index.html
+++ b/topics/azure_organization/index.html
@@ -16,7 +16,7 @@
-
+
@@ -24,7 +24,7 @@
-
+
@@ -448,6 +448,26 @@
+
+
+
+
+
+ Exercise: Using File Storage with Windows VM
+
+
+
+
+
+
+
+
+
+
+
+
+
+
diff --git a/topics/azure_tags/index.html b/topics/azure_tags/index.html
index f37824a..174cf87 100644
--- a/topics/azure_tags/index.html
+++ b/topics/azure_tags/index.html
@@ -16,7 +16,7 @@
-
+
@@ -24,7 +24,7 @@
-
+
@@ -448,6 +448,26 @@
+
+
+
+
+
+ Exercise: Using File Storage with Windows VM
+
+
+
+
+
+
+
+
+
+
+
+
+
+
diff --git a/topics/index.html b/topics/index.html
index f58d67d..ea1620e 100644
--- a/topics/index.html
+++ b/topics/index.html
@@ -16,7 +16,7 @@
-
+
@@ -24,7 +24,7 @@
-
+
@@ -448,6 +448,26 @@
+
+
+
+
+
+ Exercise: Using File Storage with Windows VM
+
+
+
+
+
+
+
+
+
+
+
+
+
+
diff --git a/topics/intro_aspects_of_cloud_computing/index.html b/topics/intro_aspects_of_cloud_computing/index.html
index 82d1ffa..27f1b52 100644
--- a/topics/intro_aspects_of_cloud_computing/index.html
+++ b/topics/intro_aspects_of_cloud_computing/index.html
@@ -16,7 +16,7 @@
-
+
@@ -24,7 +24,7 @@
-
+
@@ -448,6 +448,26 @@
+
+
+
+
+
+ Exercise: Using File Storage with Windows VM
+
+
+
+
+
+
+
+
+
+
+
+
+
+
diff --git a/topics/intro_to_cloud_interfaces/index.html b/topics/intro_to_cloud_interfaces/index.html
index 75edcd5..96cbafe 100644
--- a/topics/intro_to_cloud_interfaces/index.html
+++ b/topics/intro_to_cloud_interfaces/index.html
@@ -16,7 +16,7 @@
-
+
@@ -24,7 +24,7 @@
-
+
@@ -448,6 +448,26 @@
+
+
+
+
+
+ Exercise: Using File Storage with Windows VM
+
+
+
+
+
+
+
+
+
+
+
+
+
+
diff --git a/topics/learning_how_to_learn_about_cloud/index.html b/topics/learning_how_to_learn_about_cloud/index.html
index 0ad76e5..3a67d54 100644
--- a/topics/learning_how_to_learn_about_cloud/index.html
+++ b/topics/learning_how_to_learn_about_cloud/index.html
@@ -16,7 +16,7 @@
-
+
@@ -24,7 +24,7 @@
-
+
@@ -448,6 +448,26 @@
+
+
+
+
+
+ Exercise: Using File Storage with Windows VM
+
+
+
+
+
+
+
+
+
+
+
+
+
+
diff --git a/topics/the_computing_in_cloud_computing/index.html b/topics/the_computing_in_cloud_computing/index.html
index 8658104..309c2c0 100644
--- a/topics/the_computing_in_cloud_computing/index.html
+++ b/topics/the_computing_in_cloud_computing/index.html
@@ -14,7 +14,7 @@
-
+
@@ -22,7 +22,7 @@
-
+
@@ -446,6 +446,26 @@
+
+
+
+
+
+ Exercise: Using File Storage with Windows VM
+
+
+
+
+
+
+
+
+
+
+
+
+
+