Name		Name	Last commit message	Last commit date
parent directory ..
README.md		README.md

README.md

Locality Exists in Graph Processing: Workload Characterization on an Ivy Bridge Serve

Reference: "Locality Exists in Graph Processing: Workload Characterization on an Ivy Bridge Server", Scott Beamer, Krste Asanović, and David Patterson, International Symposium on Workload Characterization (IISWC), Atlanta, October 2015

Learn more about the followings:

Questions:

Notes:

Graph algorithms struggle to fully utilize the platform’s memory bandwidth.
Increasing memory bandwidth utilization could be just as effective as decreasing communication.
Because message-passing is far less efficient than accessing memory in contemporary systems, distributed clusters are a poor match to graph processing.
The biggest memory bandwidth bottleck is the instruction window.
Input graph has the largest impact on performance characteristics.
TLB misses are only detrimental when at least a moderate amount of memory bandwidth is utilized.
Branch misspredictions and unresolved data dependencies can also hinder memory bandwidth utilization.
Memory bandwidth is also moderately hindered by NUMA effects.

Some insights:

Graph apps exhibit substantial locality. They experience moderately high LLC hit rate.
Reorder buffer size limits achievable memory throughput. Relatively high LLC hit rate implies many instructions are executed for each LLC miss.These instructions fill the reorder buffer in the core, preventing future loads that will miss in the LLC from issuing early, resulting in unused memory bandwidth.
Multithreading has limited potential for graph processing

Memory Bandwidth Potential

A load must satisfy the following four requirements:

Processor fetches load instruction.
Space in instruction window - Reorder Buffer (ROB) is not full and has room for the load.
Register operands are available.
Memory bandwidth is available - At the core level there is a MSHR available and there is not excessive contention within the on-chip interconnect or at the memory controller.

Memory bandwidth cannot be a bottleneck unless the first three requirements are satisfied.

The biggest bandwidth bottleneck is the instruction window because it cannot hold a sufﬁcient number of instructions to incorporate the needed number of cache-missing instructions.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Locality-Exists-In-Graph-Processing

Locality-Exists-In-Graph-Processing

README.md

Locality Exists in Graph Processing: Workload Characterization on an Ivy Bridge Serve

Learn more about the followings:

Questions:

Notes:

Memory Bandwidth Potential

Files

Locality-Exists-In-Graph-Processing

Directory actions

More options

Directory actions

More options

Latest commit

History

Locality-Exists-In-Graph-Processing

Folders and files

parent directory

README.md

Locality Exists in Graph Processing: Workload Characterization on an Ivy Bridge Serve

Learn more about the followings:

Questions:

Notes:

Memory Bandwidth Potential