-
Notifications
You must be signed in to change notification settings - Fork 9
LINPACK results
This page stores various results obtained from running the LINPACK benchmark included in this package. Bold text indicates the highest result for a given precision.
Jump to: Embedded Systems, Laptops and Portables, Desktops/PCs, Workstations (+non-unix)
The t5325 is a miniscule low-power thin client unveiled by HP in late 2009, designed around a Marvell Kirkwood 88F6281 system-on-a-chip implementing a Marvell designed ARMv5TE-compliant "Sheeva" processor core clocked at 1.2 GHz with independent 16 KiB instruction and data caches and a 256 KiB unified secondary cache. Derived from the ARM926EJ-S, the Sheeva core does not feature an on-chip floating point unit.
The t5325 features 512 MiB of onboard DDR2 memory with 800 MT/s data rate, connected to the Kirkwood SoC's on-die memory controller through a 16-bit interface. All tests are performed under the HP "ThinPro" operating system, a lightly customized variant of Debian Lenny, on a system not specifically configured for benchmarking.
With no floating point unit of any kind, the t5325's Kirkwood processor returns abysmal results on LINPACK and similar FP-heavy applications despite its reasonable 1.2 GHz clock frequency and on-die secondary cache, even with maximal optimization.
Options | Single-precision | Double-precision |
---|---|---|
none | 14.101 MFLOPS | 9.319 MFLOPS |
-O1 |
18.844 MFLOPS | 11.537 MFLOPS |
-O2 |
19.003 MFLOPS | 11.778 MFLOPS |
-O3 |
18.650 MFLOPS | 11.902 MFLOPS |
-O3 -ffast-math |
18.384 MFLOPS | 11.839 MFLOPS |
Options | Single-precision | Double-precision |
---|---|---|
none | 13.787 MFLOPS | 9.422 MFLOPS |
-O1 |
18.009 MFLOPS | 11.185 MFLOPS |
-O2 |
18.284 MFLOPS | 11.298 MFLOPS |
-O3 |
18.304 MFLOPS | 11.275 MFLOPS |
-O3 -ffast-math |
18.314 MFLOPS | 11.245 MFLOPS |
The ToughBook U1 is a unique and highly ruggedized UMPC released by Panasonic in 2008, and built around Intel's hyper-threaded "Silverthorne" Atom microprocessor, featuring a 32 KiB instruction cache and a 24 KiB data cache, along with a unified 512 KiB secondary cache. The Z520 model featured in the U1 has a 1.33 GHz clock frequency, and is connected to an Intel US15W System Controller Hub using a 533 MHz front-side bus. The US15W SCH features an integrated memory controller connected to 1 GiB of on-board DDR2 memory, likely with 533 MT/s data rate, though 400 MT/s is possible. All tests are performed under Windows XP with the Cygwin environment, on a system not specifically configured for benchmarking.
Options | Single-precision | Double-precision |
---|---|---|
none | 48.319 MFLOPS | 48.656 MFLOPS |
-O1 |
163.421 MFLOPS | 160.663 MFLOPS |
-O2 |
152.214 MFLOPS | 152.728 MFLOPS |
-O3 |
169.706 MFLOPS | 165.363 MFLOPS |
-Ofast |
160.667 MFLOPS | 160.949 MFLOPS |
Options | Single-precision | Double-precision |
---|---|---|
none | 50.978 MFLOPS | 49.399 MFLOPS |
-O1 |
162.586 MFLOPS | 145.276 MFLOPS |
-O2 |
160.025 MFLOPS | 146.818 MFLOPS |
-O3 |
163.737 MFLOPS | 144.105 MFLOPS |
-Ofast |
157.878 MFLOPS | 140.956 MFLOPS |
The E6420 is a midrange 14-inch business notebook introduced in early 2012, this particular configuration features Intel's dual-core, hyper-threaded Core i5 "Sandy Bridge" microprocessor with 32+32 KiB per-core instruction and data caches, 256 KiB per-core second level cache, a 3 MiB shared tertiary cache and a standard 2.6 GHz clock and maximum frequency of 3.3 GHz. All models of the E6420 are built around Intel's QM67 express chipset, connected to the system processor through a 5 GT/s Direct Media Interface. This system is configured with 4 GiB of DDR3 SDRAM clocked at 667 MHz (for 1333 MT/s data rate) and directly interfaced to the processor's on-die memory controller. All tests are performed under CentOS 7.5.1804 on a system not specifically configured for benchmarking.
All results reflect single-threaded execution. This version of LINPACK does not take any advantage of multi-core and multi-threaded processors.
Options | Single-precision | Double-precision |
---|---|---|
none | 583.570 MFLOPS | 580.643 MFLOPS |
-O1 |
1.809 GFLOPS | 1.886 GFLOPS |
-O2 |
2.407 GFLOPS | 2.160 GFLOPS |
-O3 |
3.112 GFLOPS | 2.731 GFLOPS |
-Ofast |
2.731 GFLOPS | 2.412 GFLOPS |
Options | Single-precision | Double-precision |
---|---|---|
none | 696.797 MFLOPS | 668.994 MFLOPS |
-O1 |
2.395 GFLOPS | 2.108 GFLOPS |
-O2 |
3.312 GFLOPS | 2.579 GFLOPS |
-O3 |
3.364 GFLOPS | 2.473 GFLOPS |
-Ofast |
3.353 GFLOPS | 2.353 GFLOPS |
Released in late 2006 as one of Lenovo's first entries into the United States market under their own name; a fairly average entry-level PC built around AMD's dual-core Athlon 64 X2 microprocessor with 64+64 KiB shared instruction and data caches, 512 KiB per-core second level cache and a 2 GHz clock frequency (model 3800+). The 3000 J115 employs NVIDIA's nForce 410 chipset, which connects to the system processor through a 1 GHz HyperTransport bus. This system is configured with 1 GiB of DDR2 SDRAM clocked at 266 MHz (for 533 MT/s data rate) and directly interfaced to the Athlon 64 X2's on-die memory controller. All tests are performed under CentOS 7.5.1804 on a system not specifically configured for benchmarking.
All results reflect single-threaded execution. This version of LINPACK does not take any advantage of multi-core processors.
Options | Single-precision | Double-precision |
---|---|---|
none | 155.954 MFLOPS | 164.460 MFLOPS |
-O1 |
804.030 MFLOPS | 952.140 MFLOPS |
-O2 |
1.049 GFLOPS | 1.005 GFLOPS |
-O3 |
1.234 GFLOPS | 1.005 GFLOPS |
-Ofast |
1.304 GFLOPS | 1.005 GFLOPS |
Options | Single-precision | Double-precision |
---|---|---|
none | 169.574 MFLOPS | 159.683 MFLOPS |
-O1 |
588.304 MFLOPS | 368.498 MFLOPS |
-O2 |
644.872 MFLOPS | 374.674 MFLOPS |
-O3 |
715.378 MFLOPS | 352.982 MFLOPS |
-Ofast |
717.291 MFLOPS | 349.305 MFLOPS |
A mid-range Unix workstation released in 1999, based on HP's indigenous PA-8500 microprocessor with 1 MiB of on-die data cache, 512 KiB of on-die instruction cache and a clock frequency of 400 MHz. The C3000's microprocessor is interfaced to the "Astro" chipset through a 120 MHz Runway+ bus. The particular system tested had 2,560 megabytes of SDRAM, also running at 120 MHz, and was not specifically configured for benchmarking. All tests are performed under HP-UX 11.11 (11i v1).
Options | Single-precision | Double-precision |
---|---|---|
none | 30.977 MFLOPS | 31.991 MFLOPS |
-O1 |
183.197 MFLOPS | 190.428 MFLOPS |
-O2 |
211.587 MFLOPS | 215.045 MFLOPS |
-O3 |
223.342 MFLOPS | 228.996 MFLOPS |
Options | Single-precision | Double-precision |
---|---|---|
none | 39.779 MFLOPS | 34.786 MFLOPS |
-O1 |
247.936 MFLOPS | 115.235 MFLOPS |
-O2 |
294.152 MFLOPS | 127.990 MFLOPS |
-O3 |
294.152 MFLOPS | 127.261 MFLOPS |
Note: -Ofast
is only available in GCC >=4.7
Options | Single-precision | Double-precision |
---|---|---|
none | 26.526 MFLOPS | 31.407 MFLOPS |
+O1 |
31.850 MFLOPS | 38.989 MFLOPS |
+O2 |
143.009 MFLOPS | 143.009 MFLOPS |
+O3 |
159.389 MFLOPS | 155.954 MFLOPS |
+O4 |
144.725 MFLOPS | 148.284 MFLOPS |
-fast |
291.785 MFLOPS | 235.709 MFLOPS |
Options | Single-precision | Double-precision |
---|---|---|
none | 39.731 MFLOPS | 40.256 MFLOPS |
+O1 |
40.844 MFLOPS | 47.363 MFLOPS |
+O2 |
349.761 MFLOPS | 147.037 MFLOPS |
+O3 |
364.493 MFLOPS | 148.542 MFLOPS |
+O4 |
357.689 MFLOPS | 148.707 MFLOPS |
-fast |
396.844 MFLOPS | 150.542 MFLOPS |
Note: HP C +O2
is roughly equivalent to GCC -O1
The following results are from workstations not running a Unix-like operating system, compatibility environment or otherwise lacking the proper accommodations to build the LINPACK sources provided in this package as-is. Source/build tweaks are noted on a per-system basis.
The Vectra XU 6/200 was first released in early 1996, succeeding the XU 6/150 as the pinnacle of the Vectra line and HP's PC-compatible systems overall, built around one or two Intel Pentium Pro microprocessors with 8+8 KiB instruction and data caches, 256 or 512 KiB of off-die, on-package full-speed secondary cache and clock frequencies of 200 MHz. The system processors are interfaced to Intel's 440FX "Natoma" chipset through a 66 MHz front-side bus.
The particular example tested was equipped with dual 200 MHz Pentium Pro microprocessors with 512 KiB secondary caches and 128 MiB of memory running at 66 MHz. This system was freshly configured, but not specifically customized for optimal benchmark performance. All tests are performed under Windows NT 4.0, service pack 6.
Options | Single-precision | Double-precision |
---|---|---|
No optimizations | 20.012 MFLOPS | 18.091 MFLOPS |
Full optimization, 80486 code | 59.081 MFLOPS | 48.319 MFLOPS |
Full optimization, Pentium code | 62.270 MFLOPS | 45.227 MFLOPS |
Full optimization, Pentium Pro code | 65.737 MFLOPS | 48.319 MFLOPS |
Full optimization, blended code | 67.138 MFLOPS | 48.319 MFLOPS |
Options | Single-precision | Double-precision |
---|---|---|
No optimizations | 19.404 MFLOPS | 11.989 MFLOPS |
Full optimization, 80486 code | 31.472 MFLOPS | 13.396 MFLOPS |
Full optimization, Pentium code | 31.558 MFLOPS | 13.413 MFLOPS |
Full optimization, Pentium Pro code | 31.561 MFLOPS | 13.413 MFLOPS |
Full optimization, blended code | 31.469 MFLOPS | 13.296 MFLOPS |
Build notes: No modifications to the LINPACK sources are required for Visual C++ 5.0, but build configurations were used instead of the standard makefile. These configurations were derived from the default Win32 Debug
configuration with optimizations added as needed.