Skip to content

Commit

Permalink
32: add supercomputing terminology section
Browse files Browse the repository at this point in the history
  • Loading branch information
blackwer committed Apr 15, 2024
1 parent 7e36290 commit b2e5db5
Showing 1 changed file with 63 additions and 6 deletions.
69 changes: 63 additions & 6 deletions 32_IntroToHPC/main.md
Original file line number Diff line number Diff line change
Expand Up @@ -41,12 +41,75 @@ Activities where participants all actively work to foster an environment which e


## Today's Agenda
- Supercomputing components and terminology
- Flatiron resources overview
- Environment management [interactive]
- Running your jobs [interactive]



## Supercomputing terminology


### Compute nodes
- What most people would call a computer, but...
- Typically headless -- no display
- Accessed/controlled via network -- often programatically
- Often multiple network "interfaces" -- more later
- Designed for high _throughput_ computation


### Compute node architecture
- Typically large amounts of RAM (random access memory)
- temporary storage used during computation for data and program instructions
- One or more "multi-core" CPUs (central processing unit) -- FI nodes typically two
- CPU Core -- a single physical CPU on a multi-core CPU
- Cores have their own _cache_ but also share _cache_ directly with other cores
- Cores typically slower than laptop/workstation cores, but more of them and more cache/RAM
- One or more network cards (more later!)


### Compute node architecture -- `lstopo`
- Cores also sometimes have extra groupings in "NUMA" (non-uniform memory architecture) domains
- beyond scope today, but good to know
- Specifies what hardware has direct access to what memory
- `lstopo --no-io` on FI 'skylake' node
<center>
<img src="./assets/skylake-topo.png" height="300px">
</center>


### GPU node architecture
- GPU nodes have all the stuff a CPU node has, plus...
- Some GPUs - graphics processing units
- Misnomer/legacy name, used to "offload" general computation
- AKA accelerator/TPU/etc
- Great for large dense linear algebra problems
- Or... tons of small problems in parallel


### Network/fabric
- Network/fabric - the means of communication between computers
- Communication lines usually fiber/copper/wireless
- fiber most common for _high performance_ networks
- Some rough "typical" numbers
- WiFi -- wireless -- \~0.1-1 Gbit/s
- Ethernet -- copper -- \~1-40 Gbit/s
- Infiniband -- fiber -- \~100-800 Gbit/s


### Filesystems
- System that manages file organization and access
- Can be local (stored on "hard drive" like on laptop)
- _typically_ high bandwidth/low latency
- or distributed/networked (data shared between drives/computers and accessed remotely)
- _typically_ high bandwidth/high latency, networked
- Tradeoffs exist and are _extremely_ important
- Ceph and GPFS are the distributed filesystems used at FI
- Lustre also common at supercomputing centers



## Flatiron resources overview


Expand All @@ -60,12 +123,6 @@ Activities where participants all actively work to foster an environment which e
- Details at https://wiki.flatironinstitute.org/SCC/Overview


### TODO Cluster components and lingo
- File systems
- Networks
- Nodes vs CPUs vs cores


### Rusty -- compute power

- FI's "primary" cluster
Expand Down

0 comments on commit b2e5db5

Please sign in to comment.