diff --git a/03-Binary_data_to_computations.Rmd b/03-Binary_data_to_computations.Rmd index 94292bba..92c1802a 100644 --- a/03-Binary_data_to_computations.Rmd +++ b/03-Binary_data_to_computations.Rmd @@ -16,19 +16,19 @@ ottrpal::include_slide("https://docs.google.com/presentation/d/1B4LwuvgA6aUopOHE ### **CPU** - Central Processing Unit -The CPU is often called **the brain** of the computer. It has some confusing additional names, because it is such an important and prominent part of the computer, as it performs and orchestrates computational tasks [@braunl_central_2008, @CPU_redhat, @Wikipedia_CPU_2021]. +The CPU, the “Central Processing Unit”, is often called **the brain** of the computer. Like its name, it is one of the most important and prominent parts of the computer, performing and orchestrating computational tasks; as such, it has some other confusing names [@braunl_central_2008, @CPU_redhat, @Wikipedia_CPU_2021]. -It is sometimes called a **processor** or **microprocessor** (but technically these terms include both the CPU and other elements). The CPU is often what people are referring to when they describe a **"computer chip"** (which again technically includes other elements) [@braunl_central_2008, @CPU_redhat, @Wikipedia_CPU_2021]. +The CPU is sometimes called a **processor** or **microprocessor** (however, technically, these terms include both the CPU and other elements). The CPU is often what people are referring to when they describe a **"computer chip"** (which again, technically includes other elements) [@braunl_central_2008, @CPU_redhat, @Wikipedia_CPU_2021]. -The CPU is made up of several components, a few that are particularly important (two of which we have discussed): +The CPU is made up of several components, a few of which holds particular importance. We already discussed two of those components: * Arithmetic Logic Unit (ALU) * Registers * Control Unit (CU) -A group of these components together is called a **core**. Multiple cores together are also referred to as CPU**s**. As you can see describing this can get kinda tricky. +A group of these components together is called a **core**. Multiple cores together are also referred to as CPU**s**. As you can see, describing these structures can get a little tricky because of all the confusing terminology! -The component that we haven't yet discussed, the Control Unit, coordinates the ALU and the data stored in the registers, so that the ALU can perform the operations on the right data stored in the registers at the right time [@braunl_central_2008]. +The component that we haven't yet discussed, the Control Unit, coordinates the ALU and the data stored in the registers so that the ALU can perform the operations on the right data stored in the registers at the right time [@braunl_central_2008]. ```{r, fig.align='center', echo = FALSE, fig.alt= "Figure of how the processor/chip/ or CPU which includes the ALU, registers and the Control Unit are grouped together.", out.width= "100%"} ottrpal::include_slide("https://docs.google.com/presentation/d/1B4LwuvgA6aUopOHEAbES1Agjy7Ex2IpVAoUIoBFbsq0/edit#slide=id.g1076ceee833_0_1") @@ -36,16 +36,16 @@ ottrpal::include_slide("https://docs.google.com/presentation/d/1B4LwuvgA6aUopOHE Modern computers now have multiple cores. What does this mean? -This means that there are multiple groups of the above components that can each process data within the same computer. A dual core CPU is a chip with two cores. A quad-core CPU is a chip with 4 cores and so on. This allows modern computers to perform multiple tasks at the same time instead of sequentially, such as 4 tasks simultaneously on a current typical laptop (with 4 cores). This makes our computers much faster than they used to be [@Wikipedia_CPU_2021]. +This means that there are multiple groups of the above components that can each process data within the same computer. A dual core CPU is a chip with two cores. A quad-core CPU is a chip with 4 cores and so on. This allows modern computers to perform multiple tasks at the same time, instead of performing tasks sequentially. For example, a typical laptop with 4 cores nowadays can perform 4 tasks simultaneously. This ability to multitask makes our computers much faster than they used to be [@Wikipedia_CPU_2021]. -In addition to the main CPU or CPUs or cores (chose your favorite name), computers may be equipped with specialized processors called [GPUs](https://www.intel.com/content/www/us/en/products/docs/processors/what-is-a-gpu.html#) which stands for graphics processing units that are especially efficient at tasks involving images [@GPU]. Thus often tasks that involve images are done using the GPU(s) and not the CPU(s). This frees up the CPU(s) to continue on the tasks not involving images more efficiently. Note however, that GPU processors are also "generally programmable" (meaning they can work with different types of data) and can also be used to perform tasks that don't involve images [@GPU]. It's also really good at doing something called parallel processing, which means dividing up a single task into multiple pieces that can be run simultaneously and thus allowing for running a task more efficiently overall. People also use GPU graphics cards which can add additional GPUs for more computational power [@GPU]. +In addition to the main CPU (or CPUs, or cores, depending on your favorite name), computers may be equipped with specialized processors called [GPUs](https://www.intel.com/content/www/us/en/products/docs/processors/what-is-a-gpu.html#), which stands for graphics processing units, that are especially efficient at tasks involving images [@GPU]. Therefore, tasks that involve images are often performed using the GPU(s) and not the CPU(s). This enables more efficient processing of data by freeing up the CPU(s) to focus on tasks not involving images. Note, however, that GPU processors are also "generally programmable" (meaning they can work with different types of data) and can also be used to perform tasks that don't involve images [@GPU]. They are also very good at doing something called parallel processing, which means dividing up a single task into multiple pieces that can be run simultaneously and thus allowing for individual task processes to be more effective overall. People also use GPU graphics cards which can add additional GPUs to their computers for more computational power [@GPU]. ```{r, fig.align='center', echo = FALSE, fig.alt= "A computer chip is also sometimes called the CPU. Inside this CPU or chip are often multiple cores.", out.width= "100%"} ottrpal::include_slide("https://docs.google.com/presentation/d/1B4LwuvgA6aUopOHEAbES1Agjy7Ex2IpVAoUIoBFbsq0/edit#slide=id.gf6e632d05f_0_381") ``` -Hyper-threading is also an option for improving processing. This technology started in 2002 by Intel [@Wikipedia_hyper-threading]. The idea is that while part of the same core is idle or waiting for a given task, another part of the same core can work to perform another task. This isn't as efficient as a having another core or CPU, but it does improve efficiency [@hyperthreading; @Wikipedia_hyper-threading]. So many modern computer chips actually use all three efficiency boosters (having multiple cores, having GPUs, and using hyper-threading). Thus a chip with 4 cores that also has hyper-threading can work on 8 tasks simultaneously. Since it is now much easier to produce chips with multiple cores and because there are some security concerns with hyper-threading, the field seems to be moving away from hyper-threading [[@hyperthreading; @Wikipedia_hyper-threading]. +Hyper-threading is also an option for improving processing. This technology started in 2002 by Intel [@Wikipedia_hyper-threading]. The idea is that while part of a core is idle or waiting for a given task, another part of the same core can work to perform another task. This isn't as efficient as a having an additional core or CPU, but it does improve efficiency [@hyperthreading; @Wikipedia_hyper-threading]. Many modern computer chips actually use all three efficiency boosters (having multiple cores, having GPUs, and using hyper-threading). Thus, a chip with 4 cores that also has hyper-threading can work on 8 tasks simultaneously. That being said, as it is now much easier to produce chips with multiple cores, and because there are some security concerns with hyper-threading, the trend of the computing field nowadays seems to be moving away from hyper-threading [[@hyperthreading; @Wikipedia_hyper-threading]. ```{r, fig.align='center', echo = FALSE, fig.alt= "A computer chip that has hyper-threading can perform more tasks by single cores more efficiently. Thus a 4 core chip with hyper-threading can work on 8 tasks simultaneously.", out.width= "100%"} ottrpal::include_slide("https://docs.google.com/presentation/d/1B4LwuvgA6aUopOHEAbES1Agjy7Ex2IpVAoUIoBFbsq0/edit#slide=id.gfb2e21ecdc_0_75") @@ -53,7 +53,7 @@ ottrpal::include_slide("https://docs.google.com/presentation/d/1B4LwuvgA6aUopOHE ### **Memory or RAM** - short-term memory -OK, so we have already talked about how data can be stored in the registers within the CPU. This data or memory is used directly by the CPU during operations or tasks. However, our CPUs need additional quick access to instructional data to tell the CPU what to do to perform the operations and what data to use. This is also the data in a file that we are working with at a particular moment in time [@RAM_ComputerHope]. This bring us to [RAM](https://www.computerhope.com/jargon/r/ram.htm), which stands for **Random Access Memory**. It is often simply referred to as **memory**. Ram is similarly made out of transistors and capacitors like the registers within the CPU, but it is located nearby but outside of the CPU [@RAM_ComputerHope; @RAM_HowStuff_Works]. This type of memory is that it is temporary. Data is stored in RAM for only a short time, while your computer is running a task on it, but then it disappears. Due to the fact that what is stored disappears, this type of memory is also called volatile. This is why when you are working on a file, but forget to save it, you might lose your work [@RAM_ComputerHope; @RAM_HowStuff_Works]. +We have already talked about how data can be stored in the registers within the CPU. This data or memory is used directly by the CPU during operations or tasks. However, our CPUs need additional quick access to instructional data to tell the CPU what to do to perform the operations and what data to use. This is also the data in a file that we are working with at a particular moment in time [@RAM_ComputerHope]. This brings us to [RAM](https://www.computerhope.com/jargon/r/ram.htm), which stands for **Random Access Memory**. It is often simply referred to as **memory**. RAM is similarly made out of transistors and capacitors, similar to the registers within the CPU, but it is located outside of, but very near, the CPU [@RAM_ComputerHope; @RAM_HowStuff_Works]. One characteristic of this type of memory is that it is temporary. Data is stored in RAM for only a short time while your computer is running a task on it, then it disappears afterwards. Due to the fact that the stored memory disappears afterwards, this type of memory is also called volatile. This is why when you forget to save a file you are working on, you might lose your work [@RAM_ComputerHope; @RAM_HowStuff_Works]. For more information about how RAM works, check out this [website](https://computer.howstuffworks.com/ram.htm) [@RAM_HowStuff_Works]. @@ -62,13 +62,13 @@ For more information about how RAM works, check out this [website](https://compu ### **Storage** - long-term memory -We can also store data that we aren't directly using when our computer is performing operations. So for example, our excel files and word files that aren't currently in use. This type of memory is called storage and is sometimes referred to as long-term or non-volatile memory because electricity is not required to preserve this data. This type of memory is stored using [hard disk drives (HDDs) also called hard drives](https://www.computerhope.com/jargon/h/harddriv.htm) or more recently [solid-state drives (SSDs)](https://www.computerhope.com/jargon/s/ssd.htm). The reason accessing this memory is slower than accessing data stored in RAM is that it is located further away from the CPU and data needs to be transferred from the storage to the CPU along a wire when a user wants to perform operations on such data. In addition the right data needs to be found out of all of your files, which also takes some time. Furthermore, the way in which data is retrieved from HDDs and SSDs is slower than that of RAM. This type of storage allows for much larger data capacity than RAM and it is also cheaper [@hard_drive; @hard_drive_works]. +We can also store data that we aren't directly using when our computer is performing operations; for example, our excel files and word files that aren't currently in use. This type of memory is called storage memory and is sometimes referred to as long-term or non-volatile memory, because the data can be preserved without using electricity. This type of memory is stored using [hard disk drives (HDDs), also called hard drives](https://www.computerhope.com/jargon/h/harddriv.htm), or more recently, [solid-state drives (SSDs)](https://www.computerhope.com/jargon/s/ssd.htm). The reason why accessing this memory is slower than accessing data stored in RAM is that it is located further away from the CPU, and data needs to be transferred from the storage to the CPU along a wire when a user wants to perform operations on such data. In addition, the right data needs to be found from all of your files, which also takes some time. Furthermore, the way in which data is retrieved from HDDs and SSDs is slower than that of RAM. However, this type of storage allows for much larger data capacity than RAM, and it is also cheaper [@hard_drive; @hard_drive_works]. -Hard disk drives store memory using [magnetic methods](https://www.extremetech.com/computing/88078-how-a-hard-drive-works) [@hard_drive_works], while solid-state drives store memory using chips that have guess what?? +Hard disk drives store memory using [magnetic methods](https://www.extremetech.com/computing/88078-how-a-hard-drive-works) [@hard_drive_works], while solid-state drives store memory using chips that have, guess what? -They are made of yet again the important basic building block of computers - tiny bees! Oops, I mean transistors yet again, just like the CPU chip! See, those transistors are really important. +They are made of yet again the important basic building block of computers, the tiny bees - oops, I mean transistors! - just like the CPU chip! You see how these transistors are really important? -SSDs allow for much faster reading and writing of files, as well as increased reliability. However, they are more expensive and they also wear out eventually [@SSD]. +SSDs allow for much faster reading and writing of files, as well as increased reliability. However, they are more expensive and can eventually wear out [@SSD]. Here's a great explanation for how HDDs work and the difference with SSDs. It will also introduce the concept of [caching](https://en.wikipedia.org/wiki/CPU_cache), which allows for faster use of data from storage for the CPU. It is a special kind of memory that's even faster and closer to the CPU than RAM [@Wikipedia_cache_2021]: @@ -86,13 +86,13 @@ See this [link](https://computer.howstuffworks.com/solid-state-drive.htm) for mo So far we have talked about the [hardware](https://simple.wikipedia.org/wiki/Computer_hardware) of a computer, which is the physical components of a computer, while [software](https://simple.wikipedia.org/wiki/Software) is the code that tells the hardware how to function [@Wikipedia_hardware_2021; @Wikipedia_software_2021]. -Software is also important to know about. Most importantly it is useful to know about operating systems. +Software is also important for understanding how computers work. Specifically, it is useful to learn about operating systems. ### Operating systems -The [operating system](https://en.wikipedia.org/wiki/Operating_system) (sometimes simply called the OS) is a set of code or software that translates user interactions with the computer to tell the hardware (including memory and the CPU) of the computer what tasks the user wants the computer to perform and when [@Wikipedia_OS_2021]. +The [operating system](https://en.wikipedia.org/wiki/Operating_system) (sometimes simply called the OS) is a set of code or software that translates the interactions between the user and the computer to tell the hardware (including memory and the CPU) of the computer what tasks the user wants the computer to perform and when [@Wikipedia_OS_2021]. -You can think of this as the basic code to keep the computer running and functional and to allow the user to use other forms of software, such as applications [@Wikipedia_OS_2021]. Applications are specialized software programs like Microsoft Word, or an internet browser like Chrome that allow a user to do specific tasks on the computer. So your OS is what allows you to name, rename, move and save files. It helps you to keep track of memory and decides what memory should be used when and to run all of your application software. It also allows you to talk to other devices like printers or other computers. +You can think of this as the basic code to keep the computer running and functional, and to allow the user to use other forms of software, such as applications [@Wikipedia_OS_2021]. Applications are specialized software programs like Microsoft Word, or an internet browser like Chrome that allow a user to do specific tasks on the computer. Your OS is what allows you to name, rename, move and save files. It helps you to keep track of memory and decides what memory should be used when and to run all of your application software. It also allows you to talk to other devices like printers or other computers. Examples of commonly used operating systems on computers and phones are: @@ -102,12 +102,12 @@ Examples of commonly used operating systems on computers and phones are: * Linux * Android -Recall that we previously talked about how computers today are often called 64-bit? Operating systems are also designed in this way. A 64-bit operating system expects the hardware of the computer to allow for processing at least 64 bits of data at a time (the word size) [@Wikipedia_word_length_2021]. If we have registers of at least this length in the CPU, than we can in fact perform operations on data that may be up to 64 bits in length. This also means that we can perform operations on values that take up less than 64 bits. This can be important because if you try to use an operating system that expects a longer word size than the hardware can accommodate, for example a 64-bit operating system on a 32-bit computer, this will not work. Application programs are also designed according to different word sizes and again you need to choose options that are equal to or less than the word size that your CPU can accommodate [@ComputerHope_64-bit]. +Recall that we previously talked about how computers today are often called 64-bit? Operating systems are also designed in this way. A 64-bit operating system expects the hardware of the computer to allow for processing at least 64 bits of data at a time (the word size) [@Wikipedia_word_length_2021]. If we have registers of at least this length in the CPU, then we can in fact perform operations on data that may be up to 64 bits in length. The data do not __have__ to be the full 64 bits; it just means that we can perform operations on values that take up less than 64 bits. This can be important because if you try to use an operating system that expects a longer data size than the hardware can accommodate, for example a 64-bit operating system on a 32-bit computer, this will not work. Application programs are also designed according to different data sizes and again you need to choose options that are equal to or less than the data size that your CPU can accommodate [@ComputerHope_64-bit]. However, you can run a 32-bit operating system on a 64-bit computer, and a 32-bit application on a 64-bit operating system. ### Historical context -Previously, back when a university might have one single computer, as they were so large and expensive (they didn't use those nifty small transistors of today), computers didn't have sophisticated operating systems and only one task could be performed at a time by one person at a time. Back then, tasks were just manually started, prioritized, and scheduled by humans. Tasks or programs (including sometimes data) could be printed or punched on cards (called punchcards, punch cards or punched cards) that would be loaded into the machine. Data and code would be manually indicated by punching or creating a hole in the card in certain locations. For example, columns might indicate different numeric or alphabetical values. It could really be a pain for users if they accidentally dropped the cards for the program they wanted to run, as you can imagine [@punched_card_2021]! +Previously, back when computers were so large and expensive that one whole university might have had just one computer (they didn't have those nifty small transistors of today), computers didn't have sophisticated operating systems. During that era, only one task could be performed at a time, by one person at a time. Back then, tasks were just manually started, prioritized, and scheduled by humans. Tasks or programs, and sometimes data, could be printed or punched on cards (called punchcards, punch cards or punched cards) that would be loaded into the machine. Data and code would be manually indicated by punching or creating a hole in the card in certain locations. For example, columns might indicate different numeric or alphabetical values. It could really be a pain for users if they accidentally dropped the cards for the program they wanted to run, as you can imagine [@punched_card_2021]! ```{r, fig.align='center', echo = FALSE, fig.alt= "Image of a punchcard", out.width="100%"} ottrpal::include_slide("https://docs.google.com/presentation/d/1B4LwuvgA6aUopOHEAbES1Agjy7Ex2IpVAoUIoBFbsq0/edit#slide=id.gf96b1d997a_0_1") @@ -117,13 +117,13 @@ There were many [different kinds](https://www.jkmscott.net/data/Punched%20Cards. -The first operating system just allowed different programs to be run sequentially without someone manually starting each one. Now our personal computers can perform multiple tasks at the same time and schedule future tasks that our automatically run. +The first operating system allowed different programs to be run sequentially without someone manually starting each one. Now our personal computers can perform multiple tasks at the same time and schedule future tasks that our automatically run. Check out this [video](https://www.youtube.com/watch?v=KG2M4ttzBnY) if you want to learn more about how these punch cards worked. See @OS_2017 for more information about operating systems and @punched_card_2021 for really interesting information about the history of punched cards. -Also check out @hardware_history_2021 for really interesting and more extensive history about how computer hardware was developed. +Also check out @hardware_history_2021 for more interesting and extensive history about how computer hardware was developed. -Also, here is some fascinating additional reading on the role of women as computer operators starting in the 1940s. Initially computer science was actually thought of as a field for women, however this changed over time (and now women and gender minorities are hopefully becoming more represented) : +Also, here is some fascinating additional reading on the role of women as computer operators starting in the 1940s. Initially, computer science was actually thought of as a field for women; however, this changed over time to be skewed in the opposite direction. Nowadays, women and gender minorities are hopefully becoming more represented in this field. * [Article titled: Woman pioneered computer programming. Then men took their industry over](https://timeline.com/women-pioneered-computer-programming-then-men-took-their-industry-over-c2959b822523) [@visions_women_2017] * [Article titled: Untold History of AI: Invisible Women Programmed America's First Electronic Computer The “human computers” who operated ENIAC have received little credit](https://spectrum.ieee.org/untold-history-of-ai-invisible-woman-programmed-americas-first-electronic-computer) [@untold_2019] @@ -131,14 +131,14 @@ Also, here is some fascinating additional reading on the role of women as comput ## Conclusion -We hope that this chapter has given you some more knowledge about how computers actually function. +We hope that this chapter has given you some more knowledge about what goes on inside the computers as they function. In conclusion, here are some of the major take-home messages: 1) The central processing unit or CPU contains the Arithmetic Logic Unit or ALU which performs operations on data using transistor logic gates 2) A CPU chip can contain multiple cores (also called CPUs) allowing a computer to perform multiple operational tasks at a time 3) RAM is the memory for a computer for the tasks that its currently working on and is very fast to access because it is close to the CPU -4) Storage on a hard drive or solid state drive is the memory for a computer that is long-term, such as files that you aren't currently working on. It takes longer to access data from this memory as it has to travel to the CPU +4) Storage on a hard drive or solid-state drive is the memory for a computer that is long-term, such as files that you aren't currently working on. It takes longer to access data from this memory as it has to travel to the CPU 5) The operating system is what tells the computer what the user wants the computer to do and when Now that we know how a computer works in general, we will next discuss computing capacity, especially for informatics research, and how servers and cloud computing can help. diff --git a/04-Computing_Systems.Rmd b/04-Computing_Systems.Rmd index 98ecc682..755a8052 100644 --- a/04-Computing_Systems.Rmd +++ b/04-Computing_Systems.Rmd @@ -6,7 +6,7 @@ ottrpal::set_knitr_image_path() # Computing Resources -In this chapter we will describe the basics about data size and computing capacity. We will discuss the computing and storage requirements for many types of cancer related data, as well as options to perform informatics work that might require more intensive computing capacity than your personal computer. +In this chapter we will describe the basics about data size and computing capacity. We will discuss the computing and storage requirements for many types of cancer-related data, as well as options to perform informatics work that might require more intensive computing capacity than your personal computer. ```{r, fig.align='center', echo = FALSE, fig.alt= "Learning Objectives: 1. Name units of size for binary data, 2. State the computing and storage capacity of typical computers today, 3. Determine the range of storage and computing capacity required for various bioinformatics studies, 4. Recognize different methods for performing intensive computations or storing large data, 5. Explain how server and cloud computing works", out.width="100%"} @@ -16,10 +16,10 @@ ottrpal::include_slide("https://docs.google.com/presentation/d/1B4LwuvgA6aUopOHE ## Data Sizes -Recall that the smallest unit of data is a bit which is either a zero or a one. A group of 8 bits is called a byte, and most computers and phones, and software programs are constructed or designed in a way to accommodate groups of bytes at a time. For example a 32-bit machine can work with 4 bytes at a time and a 64-bit can work with 8 bytes at a time. But how big is a file that is 2 GB? When we sequence a genome, how large is that in terms of binary data? Can our local computer work with the size of data that we would like to work with? +Recall that the smallest unit of data is a bit, which is either a zero (0) or a one (1). A group of 8 bits is called a byte, and most computers, phones, and software programs are constructed or designed in a way to accommodate groups of bytes at a time. For example a 32-bit machine can work with 4 bytes at a time and a 64-bit can work with 8 bytes at a time. But how big is a file that is 2 GB? When we sequence a genome, how large is that in terms of binary data? Can our local computer work with the size of data that we would like to work with? -First let's take a look at how the size of binary data is typically described and what this actually means in terms of bits and bytes: +First, let's take a look at how the size of binary data are typically described and what these mean in terms of bits and bytes: ```{r, fig.align='center', echo = FALSE, fig.alt= "Table of different binary data units showing the name, abbreviation, and size in bits or bytes, for example a Byte is abbreviated as B and this represents 8 bits, while Gigabyte is abbreviated GB and represents roughly 1 billion bytes", out.width="100%"} @@ -35,15 +35,15 @@ Now that we know how to describe binary data sizes, let's next think about how m ## Computing Capacity -We have discussed a bit about CPUs and how they can help us perform more than one task at a time, but how many tasks can the CPU of an average computer do simultaneously these days? How much memory and storage do they typically have? What size of files can a typical computer handle? This information is sometimes called the **specs** of a computer. +We have discussed a bit about CPUs and how they can help us perform more than one task at a time, but how many simultaneous tasks can the CPU of an average computer perform these days? How much memory and storage do they typically have? What size of files can a typical computer handle? These information regarding the computer's capacity and efficiency are sometimes called the **specs** of a computer. -These values will probably change very soon, and different computers vary widely, but currently: +"Typical" or "average" specs of a computer will probably change very soon, and different computers vary widely, but currently: - **Laptops** can often perform 4-8 CPU tasks at once, and typically range from 4-16 GB in memory and 250 GB-1 TB of storage. -This means that typical laptops can multitask quite well, have in some cases 16 gigabytes for random access memory to allow the CPU to work on relatively large tasks (as we can see from the previous table that GB are actually pretty large when you think about it), and possibly 1TB for the hard drive (and or SSD), meaning that you can store thousands of photos and files like PDFs, word documents etc. It turns out that you can store around 30,000 average size photos with 250GB, so a 1TB laptop can store quite a bit of data. Therefore overall, typical laptops today are pretty powerful compared to previous computers. Note that some programs require 16 or even 32 GB of memory to run. +This means that typical laptops can multitask quite well, have in some cases 16 gigabytes for random access memory to allow the CPU to work on relatively large tasks (as we can see from the previous table that GB are actually pretty large when you think about it), and possibly 1TB for the hard drive (and/or SSD), meaning that you can store thousands of photos and files like PDFs, word documents, etc. It turns out that 250GB allows you to store around 30,000 average-size photos, so a 1TB laptop can store quite a large amount of data. Therefore, overall, typical laptops today are pretty powerful devices, especially compared to computers of previous generations. That being said, note that some programs require 16 or even 32 GB of memory to run. -- **Desktops** can perform and store similarly and sometimes to a degree better than a laptop for a similar price. Since less work needs to be done to make the desktop small and portable, sometimes you can get better storage and performance for the same price as a laptop. However, desktops often have better graphics processing capacity and displays and that might make up for the price difference [@antonio_villas-boas_laptops_2019]. This might be important to consider if you are going to need to visually inspect many images. Another benefit is that you can also sometimes find desktops with larger memory and storage options right off the shelf than typical laptops. It is also generally easier to add more memory to a desktop than it is to add to a laptop [@antonio_villas-boas_laptops_2019]. However of course, desktops certainly aren't super portable! +- **Desktops** can perform and store similarly to, and sometimes to a degree better than, a laptop for a similar price. Since less work needs to be done to make the desktop small and portable, sometimes you can get better storage and performance for the same price as a laptop. Further, desktops often have better graphics processing capacity and displays [@antonio_villas-boas_laptops_2019]. This might be important to consider if you are going to need to visually inspect many images. Another benefit is that you can also sometimes find desktops with larger memory and storage options right off the shelf than typical laptops. It is also generally easier to add more memory to a desktop than it is to add to a laptop [@antonio_villas-boas_laptops_2019]. However of course, desktops certainly aren't super portable! * Some **phones** can compete with laptops by performing 6 CPU tasks at once and storing 6 GB in memory and 250 GB of storage. @@ -55,15 +55,15 @@ ottrpal::include_slide("https://docs.google.com/presentation/d/1B4LwuvgA6aUopOHE -Check out this [link](https://www.apple.com/mac/compare/?modelList=iMac,MacBook-Pro-14,MacBook-Pro-16-2021) to compare the prices of different macs and this [link](https://www.hp.com/us-en/shop/slp/weekly-deals) to compare specs for PC computers from HP. +Check out this [link](https://www.apple.com/mac/compare/?modelList=iMac,MacBook-Pro-14,MacBook-Pro-16-2021) to compare the prices of different Macs and this [link](https://www.hp.com/us-en/shop/slp/weekly-deals) to compare specs for PC computers from HP. -If you want to get really in-depth comparisons for PC or windows machines, check out this [link](https://www.userbenchmark.com/PCBuilder/Custom/S0-M1487712vsS0-M?tab=RAM) [@userbenchmark]. +If you want to get really in-depth comparisons between PC computers and Windows machines, check out this [link](https://www.userbenchmark.com/PCBuilder/Custom/S0-M1487712vsS0-M?tab=RAM) [@userbenchmark]. ### Checking your computer capacity - Mac -So what about your computer? How do you know how many cores it has or how much memory and storage it has? +Now, what about __your__ computer? How do you know how many cores it has or how much memory and storage it has? If you have a Mac, you can click on the apple symbol on the far left of your screen. Then click on the "About This Mac" button. @@ -73,12 +73,12 @@ You might see something like this: ottrpal::include_slide("https://docs.google.com/presentation/d/1B4LwuvgA6aUopOHEAbES1Agjy7Ex2IpVAoUIoBFbsq0/edit#slide=id.gf9c252d058_0_69") ``` -First we see the operating system is called Mojave. -Next we see that the processor (which we now know is the CPU) is a 2.6 GigaHertz (GHz) Intel Core i7 chip. This means that the processor or CPU can process 2,600,000,000 operations in a second (this is called a [clock cycle](http://www.techopedia.com/definition/5498/clock-cycle)) [@clock_cycle]. That's a lot compared to older computers which had clock cycle rate or [clock rate](https://en.wikipedia.org/wiki/Clock_rate) in the MegaHertz range in the 1980s [@clock_rate]! -If we look up more about this chip we would learn that it has 4 cores and has hyper-threading, which allows it to effectively perform 8 tasks at once [@hyperthreading]. -Next we see that there is 16 Gigabytes of memory - this is how much RAM it has and also 2133 MegaHertz (aka 2.133 GHz) of low power double data rate random access memory (LPDDR3), this means that the RAM can process 2,133,000,000 commands every second [@RAM_speed; @mukherjee_ram_2019]. If you are interested you can checkout more about what this means at this blog post @scott_thornton_RAM. However, generally the amount of RAM is more important for assessing performance [@RAM_speed; @mukherjee_ram_2019]. +First, we see the operating system is called MacOS Mojave. +Next, we see that the processor (which we now know is the CPU) is a 2.6 GigaHertz (GHz) Intel Core i7 chip. This means that the processor or CPU can process 2,600,000,000 operations in a second (this is called a [clock cycle](http://www.techopedia.com/definition/5498/clock-cycle)) [@clock_cycle]. That's a lot compared to older computers in the 1980s, which had clock cycle rates or [clock rates](https://en.wikipedia.org/wiki/Clock_rate) in the MegaHertz ranges [@clock_rate]! +If we look deeper into this chip, we would learn that it has 4 cores and has hyper-threading, which allow it to effectively perform 8 tasks at once [@hyperthreading]. +Below, we see that there are 16 Gigabytes of memory - this is how much RAM it has - and also 2133 MegaHertz (aka 2.133 GHz) of low power double data rate random access memory (LPDDR3); this means that the RAM can process 2,133,000,000 commands every second [@RAM_speed; @mukherjee_ram_2019]. If you are interested you can checkout more about what this means at this blog post @scott_thornton_RAM. However, generally the amount of RAM is more important than the low power double data rate for assessing performance [@RAM_speed; @mukherjee_ram_2019]. -If we click on the storage button at the top, we can learn about how much storage is available on the computer. If you hover over a section, it tells you what file are accounting for that section of storage that is already being used. +If we click on the storage button at the top, we can learn about how much storage is available on the computer. If you hover over a section, it tells you what type of files are accounting for that particular section of storage that is being used. ```{r, fig.align='center', echo = FALSE, fig.alt= "Mac storage information showing 1 TB capactity", out.width="100%"} ottrpal::include_slide("https://docs.google.com/presentation/d/1B4LwuvgA6aUopOHEAbES1Agjy7Ex2IpVAoUIoBFbsq0/edit#slide=id.gf9c252d058_0_80") @@ -93,8 +93,8 @@ If you have a PC or Windows computer, the steps may vary depending on your opera ottrpal::include_slide("https://docs.google.com/presentation/d/1B4LwuvgA6aUopOHEAbES1Agjy7Ex2IpVAoUIoBFbsq0/edit#slide=id.gf9c252d058_0_96") ``` -1. click the "Start" button - which looks like 4 squares together -2. click "Settings" button (gear-shaped) +1. click the "Start" button (It's the button on the bottom left that looks like a grid with 4 squares together) +2. click "Settings" button (It's gear-shaped) 3. click "System" 4. click "About" @@ -113,14 +113,14 @@ ottrpal::include_slide("https://docs.google.com/presentation/d/1B4LwuvgA6aUopOHE Here we can see that this computer has an Intel(R) Core(TM) i7-4790K CPU @ 4.00 GHz 4.00 GHz chip and 16 Gigabytes of RAM. If we look up this chip we can see that it has 4 cores and 8 threads (due to hyper-threading) allowing for 8 tasks at a time. -To find out more information about your storage click the "Storage" button within the "System" tab. +To find out more information about your storage, click the "Storage" button within the "System" tab. ```{r, fig.align='center', echo = FALSE, fig.alt= "Windows/PC storage information showing 1 TB capactity", out.width="100%"} ottrpal::include_slide("https://docs.google.com/presentation/d/1B4LwuvgA6aUopOHEAbES1Agjy7Ex2IpVAoUIoBFbsq0/edit#slide=id.gf9c252d058_0_114 ") ``` -Here we can see that this computer has 466 GB + 465 GB = 932 GB across the two drives. The C drive is typically for the operating system, and the D drive is typically where you would install application programs and save files. There are 1000 GB in a TB, thus, this computer has about the same storage as the Mac that we just looked at. +Here we can see that this computer has 466 GB + 465 GB = 932 GB across the two drives. The **C: drive** is typically for the operating system, and the **D: drive** is typically where you would install application programs and save files. There are 1000 GB in a TB; therefore, we can see that this computer has about the same storage as the Mac that we just looked at. ## File Sizes @@ -136,7 +136,7 @@ ottrpal::include_slide("https://docs.google.com/presentation/d/1B4LwuvgA6aUopOHE Genomic data files can be quite large and can require quite a bit of storage and processing power. -Here is an image of sizes of some common file types: +Below is a table of the approximate sizes of some common file types: ```{r, fig.align='center', echo = FALSE, fig.alt= "Table of file types for genomics data, whole genome sequencing can become larger than the capacity of your computer with less than 20 samples! Even whole exome sequenceing can already require more than 44% of a 1TB hard drive for just 20 samples. Note that these are approximate values.", out.width="100%"} ottrpal::include_slide("https://docs.google.com/presentation/d/1B4LwuvgA6aUopOHEAbES1Agjy7Ex2IpVAoUIoBFbsq0/edit#slide=id.gfb2e21ecdc_0_0") @@ -146,9 +146,9 @@ ottrpal::include_slide("https://docs.google.com/presentation/d/1B4LwuvgA6aUopOHE ### Imaging Data File Sizes -Imaging data, although often smaller than genomic data, can start to add up quickly with more images and samples. +Imaging data, although often smaller than genomic data, can start to add up quickly with multiple images and samples. -Here is an table of average file sizes for various medical imaging modalities from @liu_imaging_2017: +Here is a table of average file sizes for various medical imaging modalities from @liu_imaging_2017: ```{r, fig.align='center', echo = FALSE, fig.alt= "Table of file types for imaging data, most modalities have files in the range of MB to GB. Note that these are approximate values.", out.width="100%"} ottrpal::include_slide("https://docs.google.com/presentation/d/1B4LwuvgA6aUopOHEAbES1Agjy7Ex2IpVAoUIoBFbsq0/edit#slide=id.gfb2e21ecdc_0_35") @@ -156,7 +156,7 @@ ottrpal::include_slide("https://docs.google.com/presentation/d/1B4LwuvgA6aUopOHE [[source](https://www.mdpi.com/2078-2489/8/4/131)] -Note that depending on the study requirements, several images may be needed for each sample. Thus data storage needs can add up quickly. +Note that depending on the study requirements, several images may be needed for each sample. As such, data storage needs can add up quickly. ```{r, fig.align='center', echo = FALSE, fig.alt= "Example table of overall file storage needs for samples in imaging studies.", out.width="100%"} ottrpal::include_slide("https://docs.google.com/presentation/d/1B4LwuvgA6aUopOHEAbES1Agjy7Ex2IpVAoUIoBFbsq0/edit#slide=id.gfb2e21ecdc_0_25") @@ -167,9 +167,9 @@ ottrpal::include_slide("https://docs.google.com/presentation/d/1B4LwuvgA6aUopOHE Really large clinical datasets can also produce sizable file sizes. For example the [Healthcare Cost and Utilization Project (HCUP) National (Nationwide) Inpatient Sample (NIS)](https://www.hcup-us.ahrq.gov/db/nation/nis/nisdbdocumentation.jsp) contains data on more than seven million hospital stays in the United States with regional information. -According to the NIS website it "enables analyses of rare conditions, uncommon treatments, and special populations" [@NIS]. +According to the NIS website, it "enables analyses of rare conditions, uncommon treatments, and special populations" [@NIS]. -Looking at the [file sizes](https://www.hcup-us.ahrq.gov/db/state/sedddist/sedddist_filesize.jsp) for the NIS data for different states across years, you can see that there are files for some states, such as California as large as 24,000 MB or 2.4 GB [@NIS]. You can see how this could add up across years and states quite quickly. +Looking at the [file sizes](https://www.hcup-us.ahrq.gov/db/state/sedddist/sedddist_filesize.jsp) for the NIS data for different states across years, you can see that there are files for some states which can be as large as 24,000 MB or 2.4 GB for California [@NIS]. You can see how this could add up across years and states quite quickly. ```{r, fig.align='center', echo = FALSE, fig.alt= "Table of file sizes for the Healthcare Cost and Utilization Project (HCUP) National (Nationwide) Inpatient Sample (NIS) of data from different years and states.", out.width="100%"} @@ -178,7 +178,7 @@ ottrpal::include_slide("https://docs.google.com/presentation/d/1B4LwuvgA6aUopOHE ### Checking file sizes on Mac -If you own a Mac and want to check the size of a particular file, look at your file within a finder window. You can open a new finder window by clicking on the button that looks like a square with two colors and a face, typically in the bottom left corner if your dock or the strip of icons on your screen to help you navigate to different application programs. +If you own a Mac and want to check the size of a particular file, you can find it by locating your file within a finder window. You can open a new finder window by clicking on the button that looks like a square with two colors and a face (see image below), typically in the bottom left corner on your dock (which means the strip of icons on your Mac screen) to help you navigate to different application programs. ```{r, fig.align='center', echo = FALSE, fig.alt= "Mac finder button", out.width="100%"} ottrpal::include_slide("https://docs.google.com/presentation/d/1B4LwuvgA6aUopOHEAbES1Agjy7Ex2IpVAoUIoBFbsq0/edit#slide=id.gf9c252d058_0_120") @@ -192,7 +192,7 @@ ottrpal::include_slide("https://docs.google.com/presentation/d/1B4LwuvgA6aUopOHE -You can right click on a file and click the "Get Info" button. This will give your more specific information. +You can right click on a file and click the "Get Info" button. This will give you more specific information. ```{r, fig.align='center', echo = FALSE, fig.alt= "Right clicking on a file in the finder window can give you more info about a file.", out.width="100%"} ottrpal::include_slide("https://docs.google.com/presentation/d/1B4LwuvgA6aUopOHEAbES1Agjy7Ex2IpVAoUIoBFbsq0/edit#slide=id.gf9c252d058_0_128") @@ -201,14 +201,14 @@ ottrpal::include_slide("https://docs.google.com/presentation/d/1B4LwuvgA6aUopOHE ### Checking file sizes on PC/Windows -In a similar manner to checking file sizes on a Mac, with a Windows or PC computer, you can navigate to files by first opening the File Explorer application by typing this in the search bar next to the "start" button. +Similar to the process of checking file sizes on a Mac, if you're using a Windows or PC computer, you can navigate to your files by first opening the File Explorer application by typing this in the search bar next to the "start" button. ```{r, fig.align='center', echo = FALSE, fig.alt= "Finding the File Explorer on a Windows/PC computer", out.width="100%"} ottrpal::include_slide("https://docs.google.com/presentation/d/1B4LwuvgA6aUopOHEAbES1Agjy7Ex2IpVAoUIoBFbsq0/edit#slide=id.gf9c252d058_0_139") ``` -Then navigate to a file of interest which will show information about the size in one of the columns to the right, if you hover over the file name, you will get more specific information. +Then navigate to your file of interest, which will show information about the size in one of the columns to the right. If you hover over the file name, you will get more specific information. ```{r, fig.align='center', echo = FALSE, fig.alt= "File sizes are listed for each file within File Explorer windows on Windows/PC computers", out.width="100%"} ottrpal::include_slide("https://docs.google.com/presentation/d/1B4LwuvgA6aUopOHEAbES1Agjy7Ex2IpVAoUIoBFbsq0/edit#slide=id.gf9c252d058_0_145") @@ -219,32 +219,31 @@ ottrpal::include_slide("https://docs.google.com/presentation/d/1B4LwuvgA6aUopOHE ### **Personal computers** -These are computers that your lab might own, such as a laptop, a desktop, used by one individual or maybe just a few individuals in your lab. +These are computers that your lab might own, such as laptops or desktops, used by one individual or maybe a few individuals in your lab. ```{r, fig.align='center', echo = FALSE, fig.alt= "A computer chip is also called the CPU. Inside this CPU or chip are often multiple cores, which are actually individual CPUs.", out.width= "100%"} ottrpal::include_slide("https://docs.google.com/presentation/d/1B4LwuvgA6aUopOHEAbES1Agjy7Ex2IpVAoUIoBFbsq0/edit#slide=id.gf9c252d058_0_3") ``` -If you are not performing intensive computational tasks, it is possible that you will only need personal computers for your lab. However, you may find that this changes, and you might require connecting your personal computers to shared computers for more computational power and or storage. +If you are not performing intensive computational tasks, it is possible that you will only need personal computers for your lab. However, as your project expands and you start working with more and complex data, you might require connecting your personal computers to shared computers for more computational power and/or storage. ### **Shared Computing Resources** -What if you decide that you do need more computational power than your personal computer? You may encounter times where certain informatics tasks take way too long or are not even possible. Evaluating the potential file sizes of the data that you might be working with is a good place to start. However, keep in mind that sometimes certain computations may require more memory than you expect. This is particularly true when working with genomic or image files which are often compressed. So what can you do when you face this issue? +What if you decide that you need more computational power than your personal computer? You may encounter times where certain informatics tasks take way too long or are not even possible. Evaluating the potential file sizes of the data that you might be working with is a good place to start. However, keep in mind that sometimes certain computations may require more memory than you expect. This is particularly true when working with genomic or image files which are often compressed. What can you do when you face this issue? -One great option, which can be quite affordable is using a server. +One great and affordable option is to use a server. -In terms of hardware, the term [server](https://techterms.com/definition/server) means a computer (often a computer that has much more storage and computing capacity than a typical computer) or groups of computers that can be accessed by other computers using a local network or the internet to perform computations or store data [@server_def]. They are often shared by people, and allow users to perform more intensive computational tasks or store large amounts of data. Read [here](https://en.wikipedia.org/wiki/Server_(computing)) to learn more [@server_2021]. +Hardware-wise, the term [server](https://techterms.com/definition/server) means a computer (often a computer that has much more storage and computing capacity than a typical computer) or groups of computers that can be accessed by other computers using a local network or the internet to perform computations or store data [@server_def]. They are often shared by people, and allow users to perform more intensive computational tasks or store large amounts of data. Read [here](https://en.wikipedia.org/wiki/Server_(computing)) to learn more [@server_2021]. -Using a group of computers is often a much more cost effective option than having one expensive supercomputer (a computer that individually has the computational power of many personal computers) to act as a server [@supercomputer_2022]. It turns out that buying several less powerful computers is cheaper. In some cases however, an institute or company might even have a sever with multiple supercomputers! +Using a group of computers is often a much more cost-effective option than having one expensive supercomputer (a computer that individually has the computational power of many personal computers) to act as a server [@supercomputer_2022]. It turns out that buying several less powerful computers is cheaper. In some cases however, an institute or company might even have a sever with multiple supercomputers! -As an example use of a server, your lab members could connect to a server from their own computers to allow each of them more computational power. Typically computers that act as servers are set up a bit differently than our personal computers, as they do not need the same functionality and are designed to optimize data storage and computational power. For instance they often don't have capabilities to support a [graphical user interface](https://www.omnisci.com/technical-glossary/graphical-user-interface) (meaning the visual display output that you see on your personal computer) [@GUI]. +For example, your lab members could connect to a server from their own computers to allow each of them more computational power. Typically computers that act as servers are set up a bit differently than our personal computers, as they do not need the same functionality. These computers are designed to optimize data storage and computational power. For instance, they often don't have capabilities to support a [graphical user interface](https://www.omnisci.com/technical-glossary/graphical-user-interface), meaning the visual display output that you see on your personal computer [@GUI]. - -Instead they are typically only accessed by using a [command-line interface](https://en.wikipedia.org/wiki/Command-line_interface), meaning that users write code instead of using buttons like they might for a program like Microsoft Word that uses a graphical user interface [@command-line_2022]. In order to support this they have memory, processors or CPUs, and storage like your laptop. +Instead, they are typically only accessed by using a [command-line interface](https://en.wikipedia.org/wiki/Command-line_interface), meaning that users write code instead of using buttons like they might for a program like Microsoft Word that uses a graphical user interface [@command-line_2022]. In order to support this they have memory, processors or CPUs, and storage, like your laptop. Here is what a server might look like: @@ -255,7 +254,7 @@ ottrpal::include_slide("https://docs.google.com/presentation/d/1B4LwuvgA6aUopOHE In this case we have a group of computers making up this server. Here we see the nodes (the individual computers that make up the server) stacked in columns. -Among shared computing resources/servers there are three major options: +Among shared computing resources/servers, there are three major options: * **Clusters** - institutional or national resources * **Grids** - institutional or national resources @@ -264,11 +263,11 @@ Among shared computing resources/servers there are three major options: ### Computer Cluster -In a [computing cluster](https://en.wikipedia.org/wiki/Computer_cluster) several of the **same** type of computer (often in close proximity and connected by a local area network with actual cables or an [intranet](https://www.igloosoftware.com/blog/internet-vs-intranet-vs-extranet-whats-the-difference/) rather than the internet) work together to perform pieces of the same single task simultaneously [@computer_cluster_2022]. The idea of performing multiple computations simultaneously is called [parallel computing](https://en.wikipedia.org/wiki/Parallel_computing) [@parallel_2021]. +In a [computing cluster](https://en.wikipedia.org/wiki/Computer_cluster), several of the **same** type of computer (often in close proximity and connected by a local area network with actual cables or an [intranet](https://www.igloosoftware.com/blog/internet-vs-intranet-vs-extranet-whats-the-difference/) rather than the internet) work together to perform pieces of the same single task simultaneously [@computer_cluster_2022]. The idea of performing multiple computations simultaneously is called [parallel computing](https://en.wikipedia.org/wiki/Parallel_computing) [@parallel_2021]. -There are different designs or architectures for clusters. One common one is the [Beowulf cluster](https://en.wikipedia.org/wiki/Beowulf_cluster) in which a master computer (called front node or server node) breaks a task up into small pieces that the other computers (called **client nodes** or simply **nodes**) perform [@beowulf_2022]. +There are different designs or architectures for clusters. One common design is the [Beowulf cluster](https://en.wikipedia.org/wiki/Beowulf_cluster) in which a master computer (called front node or server node) breaks a task up into small pieces that the other computers (called **client nodes** or simply **nodes**) perform [@beowulf_2022]. -For example, if a large file needs to be converted to a different format, **pieces** of the file will be converted simultaneously by the different nodes. Thus each node is performing the **same task** just with different pieces of the file. The user has to write code in a special way to specify that they want parallel processing to be used and how. See [here](https://www.freecodecamp.org/news/how-to-supercharge-your-bash-workflows-with-gnu-parallel-53aab0aea141/) for an introduction about how this is done @Zach_Caceres_GNU_Parallel_2019. +For example, if a large file needs to be converted to a different format, **pieces** of the file will be converted simultaneously by the different nodes. Thus, each node is performing the **same task** just with different __pieces__ of the file. The user has to write code in a special way to specify that they want parallel processing to be used and how they want this parallel processing to be performed. See [here](https://www.freecodecamp.org/news/how-to-supercharge-your-bash-workflows-with-gnu-parallel-53aab0aea141/) for an introduction about how this is done @Zach_Caceres_GNU_Parallel_2019. It is important to realize that the CPUs in each of the node computers connected within a cluster are all performing a similar task simultaneously. @@ -278,9 +277,9 @@ See [here](https://cs.wmich.edu/~elise/courses/cs626/s09/hussein/Parallel_and_Cl In a [computing grid](https://hazelcast.com/glossary/grid-computing/) are often **different** types of computers in **different** locations work towards an overall common goal by performing **different** tasks [@grid]. -Again, just like computer clusters, there are many types of architectures that can be rather simple to very complex. For example you can think of different universities collaborating to perform different computations for the same project. One university might perform computations using gene expression data about a particular population, while another performs computations using data from another population. Importantly each of these universities might use clusters to perform their specific task. +Again, just like computer clusters, there are many types of architectures that can be rather simple to very complex. For example you can think of different universities collaborating to perform different computations for the same project. One university might perform computations using gene expression data about a particular population, while another performs computations using data from another population. Within one location, each of these universities might use clusters to perform their specific task. -Both grids and clusters use a special type of software called middleware to coordinate the various computers involved. +Both grids and clusters use a special type of software called **middleware** to coordinate the various computers involved. Users need to write their scripts in a way that can be performed by multiple computers simultaneously. Users also need to be conscious of how to schedule their tasks and to follow the rules and etiquette of the specific cluster or grid that they are sharing (more on that soon!). See [here](https://pediaa.com/difference-between-cluster-and-grid-computing/) and [here](https://www.geeksforgeeks.org/difference-between-grid-computing-and-cluster-computing/)for more information about the difference between clusters and grids [@lithmee_difference_2018; @grid_cluster_difference_2019]. @@ -288,14 +287,14 @@ See [here](https://pediaa.com/difference-between-cluster-and-grid-computing/) a ### "Cloud" computing -More recently, the ["Cloud"](https://en.wikipedia.org/wiki/Cloud_computing) has become a common computing option. The term "cloud" has become a widely used buzzword [@cha_cloud_2015] that actually has a few slightly different definitions that have changed overtime, making it a bit tricky to keep track of. However, the "cloud" is typically meant to describe large computing resources that involve the connection of **multiple servers** in multiple locations to one another [@cloud_2022] using the internet. See [here](https://www.redhat.com/en/topics/cloud-computing/cloud-vs-virtualization) for a deeper description of what the term cloud means today and how it compares to other more traditional shared computing options [@cloud_deeper]. +More recently, the ["Cloud"](https://en.wikipedia.org/wiki/Cloud_computing) has become a common computing option. The term "cloud" has become a widely used buzzword [@cha_cloud_2015] that actually has a few slightly different definitions that have changed overtime, making it a bit tricky to keep track of. However, the "cloud" typially describes large computing resources that involve the connection between **multiple servers** in multiple locations [@cloud_2022] using the internet. See [here](https://www.redhat.com/en/topics/cloud-computing/cloud-vs-virtualization) for a deeper description of what the term cloud means today and how cloud computing compares to other more traditional shared computing options [@cloud_deeper]. -Many of us use cloud storage regularly for Google Docs and backing up photos using iPhoto and Google. Cloud computing for research works in a similar way to these systems, in that you can perform computations or store data using an available server that is part of a larger network of servers. This allows for even more computational dependability beyond a more simple cluster or grid. Even if one or multiple servers is down, you can often still use the other servers for the computations that you might need. +Many of us use cloud storage regularly for Google Docs and backing up photos using iPhoto and Google. Cloud computing for research works in a similar way to these systems, in that you can perform computations or store data using an available server that is part of a larger network of servers. This allows for even more computational dependability beyond a simpler cluster or grid. Even if one or multiple servers are down, you can often still use the other servers for the computations that you might need. Furthermore, this also allows for more opportunity to scale your work to a larger extent, as there is generally more computing capacity possible with most cloud resources [@cloudvstrad]. -Companies like Amazon, Google, Microsoft Azure, and others provide cloud computing resources. **Somewhere these companies have clusters of computers that paying customers use through the internet.** In addition to these commercial options, there are newer national government funded resource options like [Jetstream](https://portal.xsede.org/jetstream) (described in the next section). We will compare computing options in another chapter coming up. +Companies like Amazon, Google, Microsoft Azure, and others provide cloud computing resources. **Somewhere these companies have clusters of computers that customers pay to use through the internet.** In addition to these commercial options, there are newer national government funded resource options like [Jetstream](https://portal.xsede.org/jetstream) (described in the next section). We will compare computing options in another chapter coming up. @@ -306,7 +305,7 @@ ottrpal::include_slide("https://docs.google.com/presentation/d/1B4LwuvgA6aUopOHE ### Accessing Shared Computer Resources -It's important to remember that all of the shared computing options that we previously described involve a [data center](https://en.wikipedia.org/wiki/Data_center) where are large number of computers are physically housed. +It's important to remember that all of the shared computing options that we previously described involve a [data center](https://en.wikipedia.org/wiki/Data_center) where a large number of computers are physically housed. ```{r, fig.align='center', echo = FALSE, fig.alt= "Examples of servers or shared computers include clusters that may exist at your institution or national computing resources like Xsede.", out.width= "100%"} ottrpal::include_slide("https://docs.google.com/presentation/d/1B4LwuvgA6aUopOHEAbES1Agjy7Ex2IpVAoUIoBFbsq0/edit#slide=id.gf9c252d058_0_23") @@ -319,11 +318,11 @@ You may have access to a [HPC (which stands for High Performance Computing) clus If your university or institution has a HPC [cluster](https://en.wikipedia.org/wiki/Computer_cluster), this means that they have a group of computers acting like a server that people can use to store data or assist with intensive computations. Often institutions can support the cost of many computers within an HPC cluster. This means that multiple computers will simultaneously perform different parts of the computing required for a given task, thus significantly speeding up the process compared to you trying to perform the task on just your computer! -If your institute doesn't have a shared computing resource like the HPCs we just described, you could also consider a national resource option like [Xsede](https://www.xsede.org/). +If your institute doesn't have a shared computing resource like the HPCs we just described, you could also consider a national resource like [Xsede](https://www.xsede.org/). [Xsede](https://www.xsede.org/) is led by the University of Illinois National Center for Supercomputing Applications (NCSA) and includes 18 other partnering institutions (which are mostly other universities). Through this partnership, they currently support 16 supercomputers. Universities and non-profit researchers in the United States can request access to their computational and data storage resources. See [here](https://portal.xsede.org/allocations/resource-info) for descriptions of the available resources. -Here you can see a photo of Stampede2, one of the supercomputers that members of Xsede can utilize. +Here you can see a photo of Stampede2, one supercomputer that members of Xsede can utilize. ```{r, fig.align='center', echo = FALSE, fig.alt= "An image of Stampede2 one of the supercomputers that members of Xsede can use.", out.width= "100%"} ottrpal::include_slide("https://docs.google.com/presentation/d/1B4LwuvgA6aUopOHEAbES1Agjy7Ex2IpVAoUIoBFbsq0/edit#slide=id.gf9c252d058_0_63") @@ -334,11 +333,11 @@ ottrpal::include_slide("https://docs.google.com/presentation/d/1B4LwuvgA6aUopOHE > Stampede2, generously funded by the National Science Foundation (NSF) through award ACI-1134872, is one of the Texas Advanced Computing Center (TACC), University of Texas at Austin's flagship supercomputers. -See [here](https://portal.xsede.org/tacc-stampede2) for more information about how you could possibly connect to and utilize Stampede2. +See [here](https://portal.xsede.org/tacc-stampede2) for more information about how you could possibly utilize Stampede2. -Importantly when you use shared computers like national resources like Stampede2 available through Xsede, as well as institutional HPCs, you will share these resources with many other people and so you need to learn the proper etiquette for using and sharing these resources. We will discuss this more in a coming chapter. +When you use shared computers from national resources like Stampede2 through Xsede, or when you use institutional HPCs, you will share these resources with many other people. Therefore, you need to learn the proper etiquette for using and sharing these resources. We will discuss this more in an upcoming chapter. -However, there is also now an option to access the different XSEDE computing resources through a cloud environment option called [Jetstream2](https://jetstream-cloud.org/). +Additionally, there is now an option to access the different XSEDE computing resources through a cloud environment option called [Jetstream2](https://jetstream-cloud.org/). Here is a video about Jetstream2: @@ -349,24 +348,24 @@ knitr::include_url("https://www.youtube.com/embed/NQ3flxJANTw") -We will also discuss how the use of these various computing options differ in the next chapters. Importantly there are also some computing platforms that have been especially designed for scientists and specific types of researchers, so it is also useful to know about these options. +We will also discuss how the use of these various computing options differ in the next chapters. Importantly there are also some computing platforms that have been specially designed for scientists and specific types of researchers, so it is also useful to know about these options. ## Conclusion -We hope that this chapter has given you some more perspective on how large medical research data files can be, as well as given you more familiarity with how well your computer might be able to accommodate the files that you might work with. We also hope that this chapter has provided you with some more awareness about computing options that might be available to you, should you need more capacity than your current computer. +We hope that this chapter has given you some more perspective on how large medical research data files can be, and has made you more familiar with how well your computer can accommodate the files that you might work with. We also hope that this chapter has provided you with more awareness about computing options that might be available to you, should you need more capacity than your current computer. In conclusion, here are some of the major take-home messages: 1) A bit is the smallest binary digital data unit. It is a single 0 or 1. -2) A byte is a group of 8 bits, file sizes are typically described using units based on bytes. -3) A typical fancy laptop today might allow for up to 1 TB of storage, however this can quickly get used up if you are working with large data files. -4) Even if you have enough storage for a large file, you might not have enough RAM to actually work with a large data file. Your computer might be too slow to handle that type of work. In which case, you might want to consider using shared computing resources. -5) A server (when describing hardware) is a single computer (typically a supercomputer if just one computer) or group of computers that others can share to help them perform more intensive computational tasks or store large amounts of data. People often connect to these over the internet, but servers can also be connected to by directly using wires in a local network (like in a department to different offices). -6) The computers in a server are optimized for assisting users with computations or storing data. +2) A byte is composed of 8 bits. File sizes are typically described using units based on bytes. +3) A typical fancy laptop today might allow for up to 1 TB of storage; however, this can quickly get used up if you are working with large data files. +4) Even if you have enough storage for a large file, you might not have enough RAM to work with a large data file. Your computer might be too slow to handle that type of work. In this case, you might want to consider using shared computing resources. +5) A server, when describing hardware, is a single computer (typically a supercomputer) or group of computers that others can share to help them perform more intensive computational tasks or store large amounts of data. People often connect to servers over the internet, but one can also directly connect to servers using wires in a local network (for example, in different offices in a department or a company). +6) Computers in a server are optimized for assisting users with computations or storing data. 7) A supercomputer is a computer that has much more storage, memory, and computing capacity than a typical personal computer. Supercomputers are generally much more expensive than using a group of more typical computers that together would have the same collective computing and storage capacity. 8) There are two general types of servers: clusters and grids. Cluster approaches work by having several computers working on pieces of the same task simultaneously in a method called parallel computing. Grid approaches work by having different types of computers working on different tasks. -9) Cloud computing is essentially the use of many servers accessed through the internet. This is often more reliable because there are many servers to use, even if one other users are performing large tasks or if a server goes down. We will talk more about the pros and cons of this option in the coming chapters. +9) Cloud computing is essentially the use of many servers accessed through the internet. This is often more reliable because there are many servers to use, even if other users are performing large tasks or if a server goes down. We will talk more about the pros and cons of this option in the coming chapters. 10) If your institute doesn't provide you access to a shared computing resource and you don't want to use a commercial cloud option, you could consider options like [Xsede](https://www.xsede.org/) and or [Jetstream2](https://jetstream-cloud.org/), which is a national resource that you can request access to.