-
Notifications
You must be signed in to change notification settings - Fork 233
/
Copy pathCHANGELOG
347 lines (320 loc) · 15.5 KB
/
CHANGELOG
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
# Changelog 5.4.1
- Fixes linking errors due to missing bstrlib.h
- Fix for likwid-bench kernel stream_mem
- Fix builds with CUDA for versions 11.2 to 12.6
- Fix sysfeatures with ACCESSMODE=perf_event
- Add AMD Zen1 to Zen3 to sysfeatures
- Add support for apple-cpufreq and cppc driver to sysfeatures
- Fix sysfeatures on ARM architectures
# Changelog 5.4.0
- Support for Intel Granite Rapids (core, energy, uncore)
- Support for Intel Sierra Forrest (core, energy, uncore)
- Support for AMD Bergamo (core, energy, uncore)
- Support for Nvidia Grace (core, uncore)
- Fix: AMD Zen4 DataFabric units
- Fix: Multi-socket RAPL measurements on Sapphire Rapids
- Fix: Energy unit for RAPL DRAM domain for SPR, GNR and SRF
- Fix: Discovery mechanism workaround for UPI and M3UPI units of SPR
- Fix: Intel Westmere Uncore with perf_event backend
- Fix: Fujitsu A64FX (more counters, fixed topology, ...)
- Container bridge to use LIKWID inside container
- Sysfeatures interface reworked with support for various architectures and libraries
- Fix build for NVMON for CUDA 12.6+
- likwid-mpirun: SLURM pinning with cpu_mask feature
- Update of internal hwloc (2.11.2) and Lua (5.4.7) version
# Changelog 5.3.0
- Support for Intel SapphireRapids (Core, Uncore, RAPL)
- Support for AMD Zen4 (Core, Uncore, RAPL)
- Support for Apple M1
- Support for AMD GPUs (MarkerAPI, F90 interface)
- Support for AWS Graviton3 (ARM Neoverse V1)
- Support for HiSilicon TSV110
- Fix of F90 interface installation
- Support for extended umasks in ICX and SPR
- Units for metrics in performance groups
- Library calls to get meta information (version, supported features, etc.)
- Some fixes for direct access mode
- Some fixes for X86 RDPMC detection
- Update of internal hwloc (2.9.3) and Lua (5.4.6) version
- New experimental sysfeatures module
# Changelog 5.2.2
- Add mutex to pinning library
- Fix pin string parsing in pinning library
- Make SBIN path configurable in build system
- Add PKGBUILD for ArchLinux package builds
- Remove accessDaemon double-fork in systemd environements
- Group updates for L2/L3 (mainly AMD Zen)
- Fix multi-initialization in MarkerAPI
- Add energy event scaling for Fujitsu A64FX
- Nvmon: Use Cupti error string to get better warning/error messages
- Nvmon: Store events internally to re-use event strings in stopCounters
- AccessLayer: Catch SIGCHLD to stop sending requests to accessDaemon if it was killed
- likwid-genTopoCfg: Update writing and reading of topology file
- Add INST_RETIRED_NOP event for Intel Icelake (desktop & server)
- Removed some memory leaks
- Improved checks for RDPMC availability
- Add TOPDOWN_SLOTS for perf_event
- Fix for systems with CPU sockets without hwthreads (A64FX FX1000)
- Fix if HOME environment variable is not set (systemd)
- Reader function for perf_event_paranoid in Lua to get state early
- likwid-mpirun: Sanitize np and ppn values to avoid crashes
# Changelog 5.2.1
- Add support for Intel Rocketlake and AMD Zen3 variant (Family 19, Model 0x50)
- Fix for perf_event multiplexing (important!)
- Fix for potential deadlock in MarkerAPI (thx @jenny-cheung)
- Build and runtime fixes for Nvidia GPU backend, updates for CUDA test codes
- peakflops kernel for ARMv8
- Updates for AMD Zen1/2/3 event lists and groups
- Support spaces in MarkerAPI region tags (thx @jrmadsen)
- Use 'online' cpulist instead of 'present'
- Switch CI from Travis-CI to NHR@FAU Cx services
- Document -reset and -ureset for likwid-setFrequencies
- Reset cpuset in unpinned runs
- Remove destructor in frequency module
- Check PID if given through --perfpid
- Intel Icelake: OFFCORE_RESPONSE events
- AccessDaemon: Check PCI init state before using it
- likwid-mpirun: Set mpi type for SLURM automatically
- likwid-mpirun: Fix for skip mask for OpenMPI
# Changelog 5.2.0
- Support for AMD Zen3 (Core + Uncore)
- Support for Intel IcelakeSP (Core + Uncore)
- New affinity code
- Fix for Ivybridge uncore code
- Bypass accessdaemon by using rdpmc instruction on x86_64
- Introduce notion of CPU die in topology module
- Use CPU dies for socket-lock for Intel CascadelakeAP
- Add environment variable LIKWID_IGNORE_CPUSET to break out of current CPUset
- Fixes for affinity module CPUlist sorting
- Build against system-installed hwloc
- Update for Intel SkylakeX/CascadelakeX L3 group
- Rename DataFabric events for all generations of AMD Zen
- Add static cache configuration for Fujitsu A64FX
- Add multiplexing checks for perf_event backend
- Fix for table width of likwid-topology afteradding CPU die column
- Adding RasPi 4 with 32 bit OS as ARMv7
- Add default groups for Intel Icelake desktop
- Fix for likwid-setFrequencies to not apply minFreq when setting governor
- likwid-powermeter: Fix hwthread selection when run with -p
- likwid-setFrequencies: Get measured base frequency if register is not readable
- CLOCK group for all AMD Zen
- Fixes in Nvidia GPU support in NvMarkerAPI and topology module
# Changelog 5.1.1
- Support for Intel Cometlake desktop (Core + Uncore)
- Fix for topology module of Fujitsu A64FX
- Fix for Intel Skylake SP in SNC mode
- Fix for likwid-perfscope
- Fix for CLI argument parsing
- Updated group and data file checkers
- Vector sum benchmark in SVE
- FP_PIPE group for Fujitsu A64FX
- Maximal number of CLI arguments configurable in config.mk (currently 16384)
# Changelog 5.1.0
- Support for Intel Icelake desktop (Core + Uncore)
- Support for Intel Icelake server (Core only)
- Support for Intel Tigerlake desktop (Core only)
- Support for Nvidia GPUs with compute capability >= 7.0 (CUpti Profiling API)
- Support for Fujitsu A64FX (Core) including SVE assembly benchmarks
- Support for ARM Neoverse N1 (AWS Graviton 2)
- Support for AMD Zen3 (Core + Uncore but without any events)
- Check for Intel HWP
- Fix for TID filter of Skylake SP LLC filter0 register
- Fix for Lua 5.1
- Fix for likwid-mpirun skip masks
- Fortran90 interface for NvMarkerAPI
- CPU_is_online check to filter non-usable CPU cores
# Changelog 5.0.2
- Fix memory leak in calc_metric()
- New peakflops benchmarks in likwid-bench
- Fix for NUMA domain handling properly
- Improvements for perf_event backend
- Fix for perfctr and powermeter with perf_event backend
- Fix for likwid-mpirun for SLURM with cpusets
- Fix for likwid-setFrequencies in cpusets
- Update for POWER9 event list
- Updates for AMD Zen, Zen+ and Zen2 (events, groups)
- Fix for Intel Uncore events with same name for different devices
- Fix for file descriptor handling
- Fix for compilation with GCC10
- Remove sleep timer warning
- Update examples C-markerAPI and C-internalMarkerAPI
# Changelog 5.0.1
- Some fixes for likwid-mpirun
- Fix for hybrid pinning with multiple hosts
- Fix for perf.groups without core-local events (switch to likwid-pin)
- Fix for command line parser
- For for mpiopts parameter
- Add UPMC as Uncore counter to splitUncoreEvents()
- Expand user-given input to abspath if possible
- Check for at least one executable in user-given command
- Add skip mask for SLURM + Intel OpenMP
- Check if user-given MPI type is available
- Fix for perf_event backend when used as root
- Inlude likwid-marker.h in likwid.h to not break old MarkerAPI code
- Enable build with ARM HPC compiler (ARMCLANG compiler setting)
- Fix creation of likwid-bench benchmarks on POWER platforms
- Fix for build system in NVIDIA_INTERFACE=BUILD_APPDAEMON=true
- Update for executable tester
- Update for MPI+X test (X: OpenMP or Pthreads)
# Changelog 5.0.0
- Support for ARM architectures. Special support for Marvell Thunder X2
- Support for IBM POWER architectures. Support for POWER8 and POWER9.
- Support for AMD Zen2 microarchitecture.
- Support for data fabric counters of AMD Zen microarchitecture
- Support for Nvidia GPU monitoring (with NvMarkerAPI)
- New clock frequency backend (with less overhead)
- Generation of benchmarks for likwid-bench on-the-fly from ptt files
- Switch back to C-based metric calculator (less overhead)
- Interface function to performance groups, create your own.
- Integration of GOTCHA for hooking into client application at runtime
- Thread-local initialization of streams for likwid-bench
- Enhanced support for SLURM with likwid-mpirun
- New MPI and Hybrid pinning features for likwid-mpirun
- Interface to enable the membind kernel memory policy
- JSON output filter file (use -o output.json)
- Update of internal HWLOC to 2.1.0
# Changelog 4.3.4
- Fix for detecting PCI devices if system can split up LLC and memory channels (Intel CoD or SNC)
- Don't pin accessDaemon to threads to avoid long access latencies due to busy hardware thread
- Fix for calculations in likwid-bench if streams are used for input and output
- Fix for LIKWID_MARKER_REGISTER with perf_event backend
- Support for Intel Atom (Tremont) (nothing new, same as Intel Atom (Goldmont Plus))
- Workaround for topology detection if LLC and memory channels are split up. Kernel does not detect it properly sometimes. (Intel CoD or SNC)
- Minor updates for build system
- Minor updates for documentation
# Changelog 4.3.3
- Fixes for likwid-mpirun
- Fixes for events of Intel Skylake SP and Intel Broadwell
- Support for Intel CascadeLake X (only new eventlist, uses code from Intel Skylake SP)
- Fix for bitmask creation in Lua
- Event options for perf_event backend
- New assembly benchmarks in likwid-bench
- MarkerAPI: Function to reset regions
- Some new performance groups (DIVIDE and TMA)
- Fixes for AMD Zen performance groups
- Fix when using topology input file
- Minor bugfixes
# Changelog 4.3.2
- Fix in internal metric calculator
- Support for Intel Knights Mill (core, rapl, uncore)
- Intel Skylake X: Some fixes for events and perf. groups
- Set KMP_INIT_AT_FORK to bypass bug in Intel OpenMP memory allocator
- AMD Zen: Use RETIRED_INSTRUCTION instead of fixed-purpose counter for metric calculation
- All FLOPS_* groups now have vectorization ratio
- Fix for MarkerAPI with perf_event backend
- Fix for maximal/minimal uncore frequency
- Skip counters that are already in use, don't exit
- likwid-mpirun: minor fix when overloading a host
- Improved detection of PCI devices
# Changelog 4.3.1
- Fix for setting/getting turbo mode in frequency module
- Exchanged two events in perf. groups of Intel Skylake X
# Changelog 4.3.0
- Support for Intel Skylake SP architecture (core, uncore, energy)
- Support for AMD Zen architecture (core, l2, energy)
- Pinning strategy 'balanced'
# Changelog 4.2.1
- Fix for logical selection strings
- likwid-agent: general update
- likwid-mpirun: Improved SLURM support
- likwid-mpirun: Print metrics sorted as they are listen in perf. group
- likwid-perfctr: Print metrics/events as header in timeline mode
- likwid-setFrequency: Commandline options to set min, max and current frequency
- Pinning-Library: Automatically detect and skip shepard threads
- Intel Broadwell: Added support for E3 (like Desktop), Fix for L3 group
- Intel IvyBridge: Fix for PCU fixed-purpose counters
- Intel Skylake: Fix for events CYCLE_ACTIVITY, new event L2_LINES_OUT
- Intel Xeon Phi (KNL): Fix for overflow register, Update for ENERGY group
- Intel SandyBridge: Fix for L3CACHE group
- Event/Counter list contains only usable counters and events
# Changelog 4.2.0
- Support for Intel Xeon Phi (Knights Landing): Core, Uncore, RAPL
- Support for Uncore counters of some desktop chips (SandyBridge, IvyBridge,
Haswell, Broadwell and Skylake)
- Basic support for Linux perf_event interface instead of native access.
Currently only core-local counters working, Uncore is experimental
- Support to build against a existing Lua installation (5.1 - 5.3 tested)
- Support for CPU frequency manipulation, Lua interface updated
- Access module checks for LLNL's msr_safe kernel module
- Support for counter registers that are only available when
HyperThreading is off
- Fix for non-HyperThreading counters (PMC4-7) on Intel Broadwell
- Socket measurements can be used for all cores on the socket in
metric formulas.
- likwid-perfctr: Timeline mode without executable runs until user presses Ctrl+c
- likwid-perfctr: New CYCLE_ACTIVITY groups
- likwid-perfctr: New PORT_USAGE groups (only with deactivated HyperThreading)
- likwid-perfctr: Regions are sorted in output as they are executed by the code
- likwid-powermeter: Read Uncore frequency settings and performance energy bias
- likwid-powermeter: Update of energy unit for DRAM domain for Intel
Broadwell D/EP and Intel Xeon Phi (Knights Landing)
- likwid-bench: Fix for 'cycles per update' metric
- likwid-bench: Vector lengths are sanitized for thread count and loop stride
- likwid-topology: Increase robustness
- likwid-mpirun: Some fixes
# Changelog 4.1.2
- Fix for likwid-powermeter: Use proper energy unit
- Fix for performance groups for Intel Broadwell (D/EP): DATA and FALSE_SHARE
- Reduce number of started access daemons
- Clean Uncore unit local control registers (needed for simultaneous use of LIKWID 3 and 4)
- Clean config, filter and counter registers at *_finalize function
- Fix for likwid-features and likwid-perfctr
# Changelog 4.1.1
- Fix for Uncore handling for EP/EN/EX systems
- Minor fix for Uncore handling for Intel desktop systems
- Fix in generic readCounters function
- Support for Intel Goldmont (untested)
- Fixes for likwid-mpirun
# Changelog 4.1.0
- Support for Intel Skylake (Core + Uncore)
- Support for Intel Broadwell (Core + Uncore)
- Support for Intel Broadwell D (Core + Uncore)
- Support for Intel Broadwell EP/EN/EX (Core + Uncore)
- Support for Intel Airmont (Core)
- Uncore support for Intel SandyBridge, IvyBridge and Haswell
- Performance group and event set handling in library
- Internal calculator for derived metrics
- Improvement of Marker API
- Get results/metrics of last measurement cycle
- Fixed most memory leaks
- Respect 'Intel PMU sharing guide'
- Update of internal Lua to 5.3
- More examples (C++11 threads,Cilk+, TBB)
- Test suite for executables and library
- Accuracy checker supports multiple CPUs
- Security checked access daemon
- Likwid-bench supports Integer benchmarks
- Likwid-bench selects interation count automatically
- Likwid-bench has new FMA related benchmarks
- Likwid-mpirun supports SLURM job scheduler
- New tool likwid-features
# Changelog 4.0.1
- likwid-bench: Iteration determination is done serially
- likwid-bench: Manual selection of iterations possible
- likwid-perfctr: Set cpuset to all CPUs not only the first
- likwid-pin: Set cpuset to all CPUs not only the first
- likwid-accuracy.py: Enhanced plotting functions, use only instrumented likwid-bench
- likwid-accessD: Check for allowed register for PCI accesses
- Add models HASWELL_M1 (0x45) and HASWELL_M2 (0x46) to likwid-powermeter and likwid-accessD
- New test application using Cilk and Marker API
- New test application using C++11 threads and Marker API
- likwid-agent: gmetric version check for --group option and s/\s*/_/ in metric names
- likwid-powermeter: Print RAPL domain name
- Marker API: Initialize access already at likwid_markerInit()
- Marker API: likwid_markerThreadInit() only pins if not already pinned
# Changelog 4.0.0
- Support for Intel Broadwell
- Uncore support for all Uncore-aware architectures
- Nehalem (EX)
- Westmere (EX)
- SandyBridge EP
- IvyBridge EP
- Haswell EP
- Measure multiple event sets in a round-robin fashion (no multiplexing!)
- Event options to filter the counter increments
- Whole LIKWID functionality is exposed as API for C/C++ and Lua
- New functions in the Marker API to switch event sets and get intermediate results
- Topology code relies on hwloc. CPUID is still included but only as fallback
- Most LIKWID applications are written in Lua (only exception likwid-bench)
- Monitoring daemon likwid-agent with multiple output backends
- More performance groups