-
Notifications
You must be signed in to change notification settings - Fork 9
LINPACK results
This page stores various results obtained from running the LINPACK benchmark included in this package. Bold text indicates the highest result for a given precision.
Please note that this version of LINPACK is single-threaded, and all results are reflective of one processor only unless otherwise noted.
Jump to: Embedded Systems, Mobile Phones, Laptops and Portables, Desktops/PCs (+non-unix), Workstations (+non-unix), Servers
The t5325 is a miniscule low-power thin client unveiled by HP in late 2009, designed around a Marvell Kirkwood 88F6281 system-on-a-chip implementing a Marvell designed ARMv5TE-compliant "Sheeva" processor core clocked at 1.2 GHz with independent 16 KiB instruction and data caches and a 256 KiB unified secondary cache. The Sheeva core does not feature an on-chip floating point unit.
The t5325 features 512 MiB of onboard DDR2 memory with 800 MT/s data rate, connected to the Kirkwood SoC's on-die memory controller through a 16-bit interface. All tests are performed under the HP "ThinPro" operating system, a lightly customized variant of Debian Lenny, on a system not specifically configured for benchmarking.
With no floating point unit of any kind, the t5325's Kirkwood processor returns abysmal results on LINPACK and similar FP-heavy applications despite its reasonable 1.2 GHz clock frequency and on-die secondary cache, even with maximal optimization.
GCC 4.2.4: 100 x 100 matrices
Options | Single-precision | Double-precision |
---|---|---|
none | 14.101 MFLOPS | 9.319 MFLOPS |
-O1 |
18.844 MFLOPS | 11.537 MFLOPS |
-O2 |
19.003 MFLOPS | 11.778 MFLOPS |
-O3 |
18.650 MFLOPS | 11.902 MFLOPS |
-O3 -ffast-math |
18.384 MFLOPS | 11.839 MFLOPS |
GCC 4.2.4: 1000 x 1000 matrices
Options | Single-precision | Double-precision |
---|---|---|
none | 13.787 MFLOPS | 9.422 MFLOPS |
-O1 |
18.009 MFLOPS | 11.185 MFLOPS |
-O2 |
18.284 MFLOPS | 11.298 MFLOPS |
-O3 |
18.304 MFLOPS | 11.275 MFLOPS |
-O3 -ffast-math |
18.314 MFLOPS | 11.245 MFLOPS |
HP's second-generation Media Vault network storage appliance, first released in 2008, based on a Marvell Orion 88F5182 system-on-a-chip implementing a Marvell designed ARMv5TE-compliant "Feroceon" processor core clocked at 500 MHz with independent 32 KiB instruction and data caches. The Feroceon core does not feature an on-chip floating point unit.
The mv5150 features 128 MiB of onboard DDR2 memory with 800 MT/s data rate, connected to the Orion SoC's on-die memory controller through a 16-bit interface. All tests are performed under the Media Vault's embedded Linux 2.6.12.6 operating system, which has not been specifically configured for benchmarking.
With no floating point unit of any kind, the mv5150's Orion processor returns abysmal results on LINPACK and similar FP-heavy applications, nearly eighty times slower than the PA-8500 powered HP VISUALIZE C3000 to which it otherwise roughly equals in integer performance based on its peak CoreMark score.
GCC 3.4.4: 100 x 100 matrices
Options | Single-precision | Double-precision |
---|---|---|
none | 4.874 MFLOPS | 3.539 MFLOPS |
-O1 |
5.653 MFLOPS | 3.885 MFLOPS |
-O2 |
5.570 MFLOPS | 4.156 MFLOPS |
-O3 |
5.739 MFLOPS | 4.013 MFLOPS |
-O3 -ffast-math |
5.653 MFLOPS | 4.038 MFLOPS |
GCC 3.4.4: 1000 x 1000 matrices
Options | Single-precision | Double-precision |
---|---|---|
none | 4.680 MFLOPS | 3.304 MFLOPS |
-O1 |
5.454 MFLOPS | 3.729 MFLOPS |
-O2 |
5.476 MFLOPS | 3.775 MFLOPS |
-O3 |
5.476 MFLOPS | 3.800 MFLOPS |
-O3 -ffast-math |
5.484 MFLOPS | 3.799 MFLOPS |
Released in 1999, the Qube 2 is the second generation of Cobalt Networks' innovative series of turnkey network appliances designed to provide small business customers with a simple, graphically-managed solution for self-hosted email, web and file services. The Qube 2's main improvement over its predecessor is its employment of a much faster QED RM5231 microprocessor. A 64-bit MIPS IV implementation derived from the R5000, the RM5231 features larger independent 32 KiB instruction and data caches over its predecessor and a higher clock frequency ceiling, with the Qube 2 using the top-end 250 MHz offering.
The Qube 2 supports up to 256 MiB of EDO memory interfaced to a Galileo GT-64111 Universal System Controller through a 32-bit wide, 66 MT/s bus. The system controller further interfaces to the RM5231 microprocessor through a 32-bit, 66 MHz SysAD bus.
All tests are performed under NetBSD/cobalt 9.2 on a 64 MiB Qube 2 with no specific configuration for benchmarking. CPU performance on this system under NetBSD appears to be extremely erratic when tested, with LINPACK often delivering sub-1 MFLOPS performance on par with that of the significantly older VAXstation 4000 VLC during test runs. The results below were obtained through multiple rebuilds and reruns until a reasonable peak result was achieved, with unoptimized single-precision remaining an outlier.
GCC 7.5.0: 100 x 100 matrices
Options | Single-precision | Double-precision |
---|---|---|
none | 5.139 MFLOPS | 13.147 MFLOPS |
-O1 |
32.773 MFLOPS | 25.697 MFLOPS |
-O2 |
38.328 MFLOPS | 27.245 MFLOPS |
-O3 |
41.115 MFLOPS | 27.725 MFLOPS |
-Ofast |
42.667 MFLOPS | 28.807 MFLOPS |
Note: -O1
data unavailable due to a compiler bug causing problems with register allocation
GCC 4.2.1: 100 x 100 matrices
Options | Single-precision | Double-precision |
---|---|---|
none | 10.186 MFLOPS | 7.538 MFLOPS |
-O2 |
49.974 MFLOPS | 21.955 MFLOPS |
-O3 |
52.589 MFLOPS | 21.743 MFLOPS |
GCC 4.2.1: 1000 x 1000 matrices
Options | Single-precision | Double-precision |
---|---|---|
none | 10.350 MFLOPS | 7.236 MFLOPS |
-O2 |
37.426 MFLOPS | 17.109 MFLOPS |
-O3 |
37.720 MFLOPS | 17.303 MFLOPS |
Announced at CES 2010 and launched on Verizon Wireless on March 2010, the Pre Plus was an updated version of Palm's innovative Pre smartphone with double the RAM (512 MiB) and storage (16 GiB), as well as a new touch-based gesture area rather than the previous home button. Like the original Pre, the Pre Plus is designed around Texas Instruments' OMAP3430 multimedia processor featuring an ARM Cortex-A8 core clocked at 500 MHz with independent 16 KiB instruction and data caches as well as a unified 256 KiB second-level cache. Although the OMAP3430 does contain a VFP floating-point unit within its NEON SIMD coprocessor, this "VFPLite" implementation is not a fully fledged design like that found on most other Cortex-A8s and is significantly slower.
The Pre Plus features 512 MiB of 400MT/S LPDDR memory mounted directly on the OMAP3430 package and attached to its on-die memory controller via a 32-bit bus. All tests are performed under WebOS 1.4.5 with WebOS Internals' UberKernel allowing for greater range of clock frequency tweaking, but otherwise no benchmarking-specific configuration.
In addition to having a diminished floating-point unit, the Optware toolchain running on the Pre Plus for these tests does not have any support for hardware floating-point instructions, therefore all of these results gauge the OMAP3430's performance using software emulated floating-point routines. In spite of this and the resulting abysmal performance, it's interesting to note that, even when hobbled by the standard 500 MHz underclock, the Cortex-A8 core of the OMAP3430 still delivers performance nearly rivaling that of the Marvell Sheeva-based Kirkwood SoC (up to that time a performance leader among ARM implementations) found in the HP t5325 above at more than double the frequency.
GCC 4.2.3: 100 x 100 matrices, Palm default profile (500 MHz underclock)
Options | Single-precision | Double-precision |
---|---|---|
none | 8.833 MFLOPS | 5.482 MFLOPS |
-O1 |
11.626 MFLOPS | 6.622 MFLOPS |
-O2 |
12.290 MFLOPS | 6.730 MFLOPS |
-O3 |
12.425 MFLOPS | 6.781 MFLOPS |
-O3 -ffast-math |
12.290 MFLOPS | 6.979 MFLOPS |
GCC 4.2.3: 100 x 100 matrices, OMAP3430 standard clock (600 MHz)
Options | Single-precision | Double-precision |
---|---|---|
none | 13.460 MFLOPS | 7.640 MFLOPS |
-O1 |
13.959 MFLOPS | 7.962 MFLOPS |
-O2 |
15.076 MFLOPS | 8.314 MFLOPS |
-O3 |
16.038 MFLOPS | 8.437 MFLOPS |
-O3 -ffast-math |
14.684 MFLOPS | 8.134 MFLOPS |
GCC 4.2.3: 100 x 100 matrices, 1 GHz overclock
Options | Single-precision | Double-precision |
---|---|---|
none | 17.667 MFLOPS | 11.085 MFLOPS |
-O1 |
24.580 MFLOPS | 13.622 MFLOPS |
-O2 |
24.580 MFLOPS | 13.787 MFLOPS |
-O3 |
27.746 MFLOPS | 13.622 MFLOPS |
-O3 -ffast-math |
24.580 MFLOPS | 13.561 MFLOPS |
GCC 4.2.3: 1000 x 1000 matrices, Palm default profile (500 MHz underclock)
Options | Single-precision | Double-precision |
---|---|---|
none | 8.425 MFLOPS | 5.248 MFLOPS |
-O1 |
10.740 MFLOPS | 6.126 MFLOPS |
-O2 |
11.170 MFLOPS | 6.242 MFLOPS |
-O3 |
11.253 MFLOPS | 6.203 MFLOPS |
-O3 -ffast-math |
11.563 MFLOPS | 6.291 MFLOPS |
GCC 4.2.3: 1000 x 1000 matrices, OMAP3430 standard clock (600 MHz)
Options | Single-precision | Double-precision |
---|---|---|
none | 10.337 MFLOPS | 6.254 MFLOPS |
-O1 |
13.202 MFLOPS | 7.293 MFLOPS |
-O2 |
13.709 MFLOPS | 7.422 MFLOPS |
-O3 |
13.949 MFLOPS | 7.441 MFLOPS |
-O3 -ffast-math |
13.777 MFLOPS | 7.461 MFLOPS |
GCC 4.2.3: 1000 x 1000 matrices, 1 GHz overclock
Options | Single-precision | Double-precision |
---|---|---|
none | 16.568 MFLOPS | 10.076 MFLOPS |
-O1 |
21.413 MFLOPS | 11.344 MFLOPS |
-O2 |
22.323 MFLOPS | 12.011 MFLOPS |
-O3 |
22.542 MFLOPS | 11.942 MFLOPS |
-O3 -ffast-math |
22.296 MFLOPS | 11.968 MFLOPS |
The ToughBook U1 is a unique and highly ruggedized UMPC released by Panasonic in 2008, and built around Intel's hyper-threaded "Silverthorne" Atom microprocessor, featuring a 32 KiB instruction cache and a 24 KiB data cache, along with a unified 512 KiB secondary cache. The Z520 model featured in the U1 has a 1.33 GHz clock frequency, and is connected to an Intel US15W System Controller Hub using a 533 MHz front-side bus. The US15W SCH features an integrated memory controller connected to 1 GiB of on-board DDR2 memory, likely with 533 MT/s data rate, though 400 MT/s is possible. All tests are performed under Windows XP with the Cygwin environment, on a system not specifically configured for benchmarking.
GCC 5.4.0: 100 x 100 matrices
Options | Single-precision | Double-precision |
---|---|---|
none | 48.319 MFLOPS | 48.656 MFLOPS |
-O1 |
163.421 MFLOPS | 160.663 MFLOPS |
-O2 |
152.214 MFLOPS | 152.728 MFLOPS |
-O3 |
169.706 MFLOPS | 165.363 MFLOPS |
-Ofast |
160.667 MFLOPS | 160.949 MFLOPS |
GCC 5.4.0: 1000 x 1000 matrices
Options | Single-precision | Double-precision |
---|---|---|
none | 50.978 MFLOPS | 49.399 MFLOPS |
-O1 |
162.586 MFLOPS | 145.276 MFLOPS |
-O2 |
160.025 MFLOPS | 146.818 MFLOPS |
-O3 |
163.737 MFLOPS | 144.105 MFLOPS |
-Ofast |
157.878 MFLOPS | 140.956 MFLOPS |
The E6420 is a midrange 14-inch business notebook introduced in early 2012, this particular configuration features Intel's dual-core, hyper-threaded Core i5 "Sandy Bridge" microprocessor with 32+32 KiB per-core instruction and data caches, 256 KiB per-core second level cache, a 3 MiB shared tertiary cache and a standard 2.6 GHz clock and maximum frequency of 3.3 GHz. All models of the E6420 are built around Intel's QM67 express chipset, connected to the system processor through a 5 GT/s Direct Media Interface. This system is configured with 4 GiB of DDR3 SDRAM clocked at 667 MHz (for 1333 MT/s data rate) and directly interfaced to the processor's on-die memory controller. All tests are performed under CentOS 7.5.1804 on a system not specifically configured for benchmarking.
All results reflect single-threaded execution. This version of LINPACK does not take any advantage of multi-core and multi-threaded processors.
GCC 4.8.5: 100 x 100 matrices
Options | Single-precision | Double-precision |
---|---|---|
none | 583.570 MFLOPS | 580.643 MFLOPS |
-O1 |
1.809 GFLOPS | 1.886 GFLOPS |
-O2 |
2.407 GFLOPS | 2.160 GFLOPS |
-O3 |
3.112 GFLOPS | 2.731 GFLOPS |
-Ofast |
2.731 GFLOPS | 2.412 GFLOPS |
GCC 4.8.5: 1000 x 1000 matrices
Options | Single-precision | Double-precision |
---|---|---|
none | 696.797 MFLOPS | 668.994 MFLOPS |
-O1 |
2.395 GFLOPS | 2.108 GFLOPS |
-O2 |
3.312 GFLOPS | 2.579 GFLOPS |
-O3 |
3.364 GFLOPS | 2.473 GFLOPS |
-Ofast |
3.353 GFLOPS | 2.353 GFLOPS |
The last and fastest of the 12'' consumer-oriented iBook G4 line, the mid-2005 model is built around a 1.33 GHz-clocked, 32-bit PowerPC 7447a microprocessor fabricated by Freescale Semiconductor, then recently spun off from Motorola in the previous year. The 7447a is the final desktop iteration of the PowerPC 7400 'G4' microprocessor used by Apple in their systems, featuring two 32 KiB primary caches for instructions and data, a single 512 KiB on-die unified secondary cache, and some additional mobile-oriented features, such as dynamic frequency scaling and an on-chip thermal diode. The 7447a is interfaced to 512 MiB of on-board 333 MT/s DDR memory through the Intrepid ASIC, to which it is attached via a 133 MHz, 64-bit wide data bus. Intrepid also provides I/O device control and most other functionality to the complete system. All tests are performed under Mac OS X 10.4 on a system not specifically configured for benchmarking.
Apple GCC 4.0.1: 100 x 100 Matrices
Options | Single-precision | Double-precision |
---|---|---|
none | 66.266 MFLOPS | 66.510 MFLOPS |
-O1 |
402.015 MFLOPS | 488.936 MFLOPS |
-O2 |
441.236 MFLOPS | 433.309 MFLOPS |
-O3 |
502.518 MFLOPS | 509.596 MFLOPS |
Apple GCC 4.0.1: 1000 x 1000 Matrices
Options | Single-precision | Double-precision |
---|---|---|
none | 59.509 MFLOPS | 45.686 MFLOPS |
-O1 |
209.747 MFLOPS | 85.544 MFLOPS |
-O2 |
205.568 MFLOPS | 84.948 MFLOPS |
-O3 |
204.940 MFLOPS | 85.544 MFLOPS |
Released in late 2006 as one of Lenovo's first entries into the United States market under their own name; a fairly average entry-level PC built around AMD's dual-core Athlon 64 X2 microprocessor with 64+64 KiB shared instruction and data caches, 512 KiB per-core second level cache and a 2 GHz clock frequency (model 3800+). The 3000 J115 employs NVIDIA's nForce 410 chipset, which connects to the system processor through a 1 GHz HyperTransport bus. This system is configured with 1 GiB of DDR2 SDRAM clocked at 266 MHz (for 533 MT/s data rate) and directly interfaced to the Athlon 64 X2's on-die memory controller. All tests are performed under CentOS 7.5.1804 on a system not specifically configured for benchmarking.
All results reflect single-threaded execution. This version of LINPACK does not take any advantage of multi-core processors.
GCC 4.8.5: 100 x 100 matrices
Options | Single-precision | Double-precision |
---|---|---|
none | 155.954 MFLOPS | 164.460 MFLOPS |
-O1 |
804.030 MFLOPS | 952.140 MFLOPS |
-O2 |
1.049 GFLOPS | 1.005 GFLOPS |
-O3 |
1.234 GFLOPS | 1.005 GFLOPS |
-Ofast |
1.304 GFLOPS | 1.005 GFLOPS |
GCC 4.8.5: 1000 x 1000 matrices
Options | Single-precision | Double-precision |
---|---|---|
none | 169.574 MFLOPS | 159.683 MFLOPS |
-O1 |
588.304 MFLOPS | 368.498 MFLOPS |
-O2 |
644.872 MFLOPS | 374.674 MFLOPS |
-O3 |
715.378 MFLOPS | 352.982 MFLOPS |
-Ofast |
717.291 MFLOPS | 349.305 MFLOPS |
The following results are from desktops not running a Unix-like operating system, compatibility environment or otherwise lacking the proper accommodations to build the LINPACK sources provided in this package as-is. Source/build tweaks are noted on a per-system basis.
Apple's flagship Power Macintosh through the year 2000, featuring dual Motorola PowerPC 7400 'G4' microprocessors each with two on-die 32 KiB primary caches for instructions and data and 1 MiB off-die unified "backside" secondary caches. Both processors share a single module and directly interface to the 'UniNorth' controller ASIC and 512 MiB of 100 MHz SDRAM through a shared 100 MHz, 64-bit data bus.
All tests are performed under Mac OS 9.0.4 unless noted otherwise.
Metrowerks CodeWarrior 6.0
Level 4 optimizations + AltiVec target, G4-specific instructions, peephole optimization, FMADD & FMSUB instructions, instruction scheduling
n | Single-precision | Double-precision |
---|---|---|
100 | 155.606 MFLOPS | 165.716 MFLOPS |
1,000 | 91.471 MFLOPS | 46.845 MFLOPS |
The mid-range offering of Apple's final generation of PowerPC-based professional systems, the 2.3DC was introduced in October 2005 and was designed around IBM's new dual-core 64-bit PowerPC 970MP processor, which featured two PowerPC 970 cores each with 32 KiB data cache, 64 KiB instruction cache, and a unified 1 MiB secondary cache, all running at a clock frequency of 2.3 GHz. The 970MP is interfaced to an off-chip DDR2 memory controller by a 1.15 GHz, 64-bit data bus composed of two separate 32-bit uni-directional buses. This system is outfitted with 8 GiB of error-correcting DDR2 memory clocked at 266 MHz (with an effective 533 MT/s data rate.)
All tests are performed under Mac OS 10.4.11, on a system not specifically configured for benchmarking.
Apple GCC 4.0.1: 100 x 100 Matrices
Options | Single-precision | Double-precision |
---|---|---|
none | 135.005 MFLOPS | 133.020 MFLOPS |
-O1 |
851.326 MFLOPS | 841.426 MFLOPS |
-O2 |
904.534 MFLOPS | 927.726 MFLOPS |
-O3 |
936.734 MFLOPS | 1.019 GFLOPS |
Apple GCC 4.0.1: 1000 x 1000 Matrices
Options | Single-precision | Double-precision |
---|---|---|
none | 159.683 MFLOPS | 144.540 MFLOPS |
-O1 |
842.282 MFLOPS | 540.860 MFLOPS |
-O2 |
931.482 MFLOPS | 543.050 MFLOPS |
-O3 |
951.300 MFLOPS | 556.570 MFLOPS |
A mid-range Unix workstation released in 1999, based on HP's indigenous PA-8500 microprocessor with 1 MiB of on-die data cache, 512 KiB of on-die instruction cache and a clock frequency of 400 MHz. The C3000's microprocessor is interfaced to the "Astro" chipset through a 120 MHz Runway+ bus. The particular system tested had 2,560 megabytes of SDRAM, also running at 120 MHz, and was not specifically configured for benchmarking. All tests are performed under HP-UX 11.11 (11i v1).
GCC 4.2.3: 100 x 100 matrices
Options | Single-precision | Double-precision |
---|---|---|
none | 30.977 MFLOPS | 31.991 MFLOPS |
-O1 |
183.197 MFLOPS | 190.428 MFLOPS |
-O2 |
211.587 MFLOPS | 215.045 MFLOPS |
-O3 |
223.342 MFLOPS | 228.996 MFLOPS |
GCC 4.2.3: 1000 x 1000 matrices
Options | Single-precision | Double-precision |
---|---|---|
none | 39.779 MFLOPS | 34.786 MFLOPS |
-O1 |
247.936 MFLOPS | 115.235 MFLOPS |
-O2 |
294.152 MFLOPS | 127.990 MFLOPS |
-O3 |
294.152 MFLOPS | 127.261 MFLOPS |
Note: -Ofast
is only available in GCC >=4.7
HP C B.11.11.16: 100 x 100 matrices
Options | Single-precision | Double-precision |
---|---|---|
none | 26.526 MFLOPS | 31.407 MFLOPS |
+O1 |
31.850 MFLOPS | 38.989 MFLOPS |
+O2 |
143.009 MFLOPS | 143.009 MFLOPS |
+O3 |
159.389 MFLOPS | 155.954 MFLOPS |
+O4 |
144.725 MFLOPS | 148.284 MFLOPS |
-fast |
291.785 MFLOPS | 235.709 MFLOPS |
HP C B.11.11.16: 1000 x 1000 matrices
Options | Single-precision | Double-precision |
---|---|---|
none | 39.731 MFLOPS | 40.256 MFLOPS |
+O1 |
40.844 MFLOPS | 47.363 MFLOPS |
+O2 |
349.761 MFLOPS | 147.037 MFLOPS |
+O3 |
364.493 MFLOPS | 148.542 MFLOPS |
+O4 |
357.689 MFLOPS | 148.707 MFLOPS |
-fast |
396.844 MFLOPS | 150.542 MFLOPS |
Note: HP C +O2
is roughly equivalent to GCC -O1
The following results are from workstations not running a Unix-like operating system, compatibility environment or otherwise lacking the proper accommodations to build the LINPACK sources provided in this package as-is. Source/build tweaks are noted on a per-system basis.
The Vectra XU 6/200 was first released in early 1996, succeeding the XU 6/150 as the pinnacle of the Vectra line and HP's PC-compatible systems overall, built around one or two Intel Pentium Pro microprocessors with 8+8 KiB instruction and data caches, 256 or 512 KiB of off-die, on-package full-speed secondary cache and clock frequencies of 200 MHz. The system processors are interfaced to Intel's 440FX "Natoma" chipset through a 66 MHz front-side bus.
The particular example tested was equipped with dual 200 MHz Pentium Pro microprocessors with 512 KiB secondary caches and 128 MiB of memory running at 66 MHz. This system was freshly configured, but not specifically customized for optimal benchmark performance. All tests are performed under Windows NT 4.0, service pack 6.
Microsoft Visual C++ 5.0: 100 x 100 matrices
Options | Single-precision | Double-precision |
---|---|---|
No optimizations | 20.012 MFLOPS | 18.091 MFLOPS |
Full optimization, 80486 code | 59.081 MFLOPS | 48.319 MFLOPS |
Full optimization, Pentium code | 62.270 MFLOPS | 45.227 MFLOPS |
Full optimization, Pentium Pro code | 65.737 MFLOPS | 48.319 MFLOPS |
Full optimization, blended code | 67.138 MFLOPS | 48.319 MFLOPS |
Microsoft Visual C++ 5.0: 1000 x 1000 matrices
Options | Single-precision | Double-precision |
---|---|---|
No optimizations | 19.404 MFLOPS | 11.989 MFLOPS |
Full optimization, 80486 code | 31.472 MFLOPS | 13.396 MFLOPS |
Full optimization, Pentium code | 31.558 MFLOPS | 13.413 MFLOPS |
Full optimization, Pentium Pro code | 31.561 MFLOPS | 13.413 MFLOPS |
Full optimization, blended code | 31.469 MFLOPS | 13.296 MFLOPS |
Build notes: No modifications to the LINPACK sources are required for Visual C++ 5.0, but build configurations were used instead of the standard makefile. These configurations were derived from the default Win32 Debug
configuration with optimizations added as needed.
Introduced by DEC in 1991 as the most inexpensive entry in the new VAXstation 4000 line, the VLC was the smallest full-featured VAX ever built, designed around DEC's highly integrated CVAX "SOC" microprocessor with a 1 KiB shared primary instruction/data cache and an innovative on-die 8 KiB DRAM secondary cache. Through a 32-bit data bus, the SOC attaches to the DC7201 "S-chip" ASIC which provides a 32-bit interface to up to 24 MiB of error-correcting memory to the CPU, as well as ethernet and SCSI subsystems via DMA channels.
DEC C/C++ 1.2: 100 x 100 matrices
Options | Single-precision | Double-precision |
---|---|---|
None | 1.055 MFLOPS | 706.667 KFLOPS |
/OPTIMIZE=ALL |
1.055 MFLOPS | 681.124 KFLOPS |
DEC C/C++ 1.2: 1000 x 1000 matrices
Options | Single-precision | Double-precision |
---|---|---|
None | 1.254 MFLOPS | 721.519 KFLOPS |
/OPTIMIZE=ALL |
1.254 MFLOPS | 721.023 KFLOPS |
Build notes: No special accommodations were required to build LINPACK under OpenVMS.
The Sun Fire T1000 is an entry-level 1U rackmounted server released in early 2006 as one of the first systems to use Sun's radically multi-threaded UltraSPARC T1 "Niagra" microprocessor, derived from a SPARC implementation originally developed by Afara Websystems that features four, six or eight relatively simple SPARC V9 cores with individual 16 KiB instruction caches and 8 KiB data caches, a shared 3 MiB secondary cache and a single floating-point unit shared among all cores. Each core also has four threads, all sharing a single pipeline and a massive register file composed of 640 64-bit registers that allows for a thread's state to be quickly saved and resumed in a single cycle in order to maximize processor utilization in heavily multi-threaded workloads.
The 8-core T1 utilized in this T1000 is clocked at 1 GHz, and is directly interfaced to 16 GiB of error-correcting 533 MT/s DDR2 memory through two on-die memory controllers with 128-bit data buses. The T1 possesses a total of four on-die memory controllers, however the T1000 only utilizes two of them to support two banks of four memory modules each. The T1000 is capable of supporting modules of up to 4 GiB in size.
All tests are performed on a T1000 with an 8-core UltraSPARC T1 running Solaris 10 10/09 with no specific configuration for benchmarking purposes.
Although the ANSIbench-packaged LINPACK sources do not support multi-threading, it probably doesn't make much difference due to the T1 only having a single floating-point unit shared among all eight cores.
GCC 5.5.0: 100 x 100 matrices
Options | Single-precision | Double-precision |
---|---|---|
none | 13.460 MFLOPS | 12.633 MFLOPS |
-O1 |
21.955 MFLOPS | 21.744 MFLOPS |
-O2 |
22.170 MFLOPS | 24.513 MFLOPS |
-O3 |
21.862 MFLOPS | 20.510 MFLOPS |
-Ofast |
22.170 MFLOPS | 20.558 MFLOPS |
GCC 5.5.0: 1000 x 1000 matrices
Options | Single-precision | Double-precision |
---|---|---|
none | 13.937 MFLOPS | 13.233 MFLOPS |
-O1 |
22.004 MFLOPS | 20.422 MFLOPS |
-O2 |
22.047 MFLOPS | 20.397 MFLOPS |
-O3 |
21.903 MFLOPS | 20.262 MFLOPS |
-Ofast |
21.917 MFLOPS | 20.299 MFLOPS |
Sun Studio 12/Sun C 5.9: 100 x 100 matrices
Options | Single-precision | Double-precision |
---|---|---|
none | 18.237 MFLOPS | 14.684 MFLOPS |
-xO1 |
12.848 MFLOPS | 12.563 MFLOPS |
-xO2 |
21.744 MFLOPS | 21.955 MFLOPS |
-xO3 |
22.389 MFLOPS | 19.924 MFLOPS |
-xO4 |
21.743 MFLOPS | 20.558 MFLOPS |
-xO5 |
22.170 MFLOPS | 20.465 MFLOPS |
-fast |
21.955 MFLOPS | 21.134 MFLOPS |
Sun Studio 12/Sun C 5.9: 1000 x 1000 matrices
Options | Single-precision | Double-precision |
---|---|---|
none | 19.587 MFLOPS | 14.669 MFLOPS |
-xO1 |
14.245 MFLOPS | 13.500 MFLOPS |
-xO2 |
21.775 MFLOPS | 20.201 MFLOPS |
-xO3 |
21.803 MFLOPS | 20.056 MFLOPS |
-xO4 |
21.860 MFLOPS | 20.176 MFLOPS |
-xO5 |
21.860 MFLOPS | 20.176 MFLOPS |
-fast |
22.208 MFLOPS | 20.460 MFLOPS |