LINPACK results

This page stores various results obtained from running the LINPACK benchmark included in this package. Bold text indicates the highest result for a given precision.

Jump to: Embedded Systems, Laptops and Portables, Desktops/PCs (+non-unix), Workstations (+non-unix), Servers

Embedded Systems

HP t5325

The t5325 is a miniscule low-power thin client unveiled by HP in late 2009, designed around a Marvell Kirkwood 88F6281 system-on-a-chip implementing a Marvell designed ARMv5TE-compliant "Sheeva" processor core clocked at 1.2 GHz with independent 16 KiB instruction and data caches and a 256 KiB unified secondary cache. Derived from the ARM926EJ-S, the Sheeva core does not feature an on-chip floating point unit.

The t5325 features 512 MiB of onboard DDR2 memory with 800 MT/s data rate, connected to the Kirkwood SoC's on-die memory controller through a 16-bit interface. All tests are performed under the HP "ThinPro" operating system, a lightly customized variant of Debian Lenny, on a system not specifically configured for benchmarking.

With no floating point unit of any kind, the t5325's Kirkwood processor returns abysmal results on LINPACK and similar FP-heavy applications despite its reasonable 1.2 GHz clock frequency and on-die secondary cache, even with maximal optimization.

GCC 4.2.4: 100 x 100 matrices

Options	Single-precision	Double-precision
none	14.101 MFLOPS	9.319 MFLOPS
`-O1`	18.844 MFLOPS	11.537 MFLOPS
`-O2`	19.003 MFLOPS	11.778 MFLOPS
`-O3`	18.650 MFLOPS	11.902 MFLOPS
`-O3 -ffast-math`	18.384 MFLOPS	11.839 MFLOPS

GCC 4.2.4: 1000 x 1000 matrices

Options	Single-precision	Double-precision
none	13.787 MFLOPS	9.422 MFLOPS
`-O1`	18.009 MFLOPS	11.185 MFLOPS
`-O2`	18.284 MFLOPS	11.298 MFLOPS
`-O3`	18.304 MFLOPS	11.275 MFLOPS
`-O3 -ffast-math`	18.314 MFLOPS	11.245 MFLOPS

Laptops and Portables

Panasonic ToughBook U1

The ToughBook U1 is a unique and highly ruggedized UMPC released by Panasonic in 2008, and built around Intel's hyper-threaded "Silverthorne" Atom microprocessor, featuring a 32 KiB instruction cache and a 24 KiB data cache, along with a unified 512 KiB secondary cache. The Z520 model featured in the U1 has a 1.33 GHz clock frequency, and is connected to an Intel US15W System Controller Hub using a 533 MHz front-side bus. The US15W SCH features an integrated memory controller connected to 1 GiB of on-board DDR2 memory, likely with 533 MT/s data rate, though 400 MT/s is possible. All tests are performed under Windows XP with the Cygwin environment, on a system not specifically configured for benchmarking.

GCC 5.4.0: 100 x 100 matrices

Options	Single-precision	Double-precision
none	48.319 MFLOPS	48.656 MFLOPS
`-O1`	163.421 MFLOPS	160.663 MFLOPS
`-O2`	152.214 MFLOPS	152.728 MFLOPS
`-O3`	169.706 MFLOPS	165.363 MFLOPS
`-Ofast`	160.667 MFLOPS	160.949 MFLOPS

GCC 5.4.0: 1000 x 1000 matrices

Options	Single-precision	Double-precision
none	50.978 MFLOPS	49.399 MFLOPS
`-O1`	162.586 MFLOPS	145.276 MFLOPS
`-O2`	160.025 MFLOPS	146.818 MFLOPS
`-O3`	163.737 MFLOPS	144.105 MFLOPS
`-Ofast`	157.878 MFLOPS	140.956 MFLOPS

Dell Latitude E6420

The E6420 is a midrange 14-inch business notebook introduced in early 2012, this particular configuration features Intel's dual-core, hyper-threaded Core i5 "Sandy Bridge" microprocessor with 32+32 KiB per-core instruction and data caches, 256 KiB per-core second level cache, a 3 MiB shared tertiary cache and a standard 2.6 GHz clock and maximum frequency of 3.3 GHz. All models of the E6420 are built around Intel's QM67 express chipset, connected to the system processor through a 5 GT/s Direct Media Interface. This system is configured with 4 GiB of DDR3 SDRAM clocked at 667 MHz (for 1333 MT/s data rate) and directly interfaced to the processor's on-die memory controller. All tests are performed under CentOS 7.5.1804 on a system not specifically configured for benchmarking.

All results reflect single-threaded execution. This version of LINPACK does not take any advantage of multi-core and multi-threaded processors.

GCC 4.8.5: 100 x 100 matrices

Options	Single-precision	Double-precision
none	583.570 MFLOPS	580.643 MFLOPS
`-O1`	1.809 GFLOPS	1.886 GFLOPS
`-O2`	2.407 GFLOPS	2.160 GFLOPS
`-O3`	3.112 GFLOPS	2.731 GFLOPS
`-Ofast`	2.731 GFLOPS	2.412 GFLOPS

GCC 4.8.5: 1000 x 1000 matrices

Options	Single-precision	Double-precision
none	696.797 MFLOPS	668.994 MFLOPS
`-O1`	2.395 GFLOPS	2.108 GFLOPS
`-O2`	3.312 GFLOPS	2.579 GFLOPS
`-O3`	3.364 GFLOPS	2.473 GFLOPS
`-Ofast`	3.353 GFLOPS	2.353 GFLOPS

Apple iBook G4 (Mid-2005/1.33)

The last and fastest of the 12'' consumer-oriented iBook G4 line, the mid-2005 model is built around a 1.33 GHz-clocked, 32-bit PowerPC 7447a microprocessor fabricated by Freescale Semiconductor, then recently spun off from Motorola in the previous year. The 7447a is the final desktop iteration of the PowerPC 7400 'G4' microprocessor used by Apple in their systems, featuring two 32 KiB primary caches for instructions and data, a single 512 KiB on-die unified secondary cache, and some additional mobile-oriented features, such as dynamic frequency scaling and an on-chip thermal diode. The 7447a is interfaced to 512 MiB of on-board 333 MT/s DDR memory through the Intrepid ASIC, to which it is attached via a 133 MHz, 64-bit wide data bus. Intrepid also provides I/O device control and most other functionality to the complete system. All tests are performed under Mac OS X 10.4 on a system not specifically configured for benchmarking.

Apple GCC 4.0.1: 100 x 100 Matrices

Options	Single-precision	Double-precision
none	66.266 MFLOPS	66.510 MFLOPS
`-O1`	402.015 MFLOPS	488.936 MFLOPS
`-O2`	441.236 MFLOPS	433.309 MFLOPS
`-O3`	502.518 MFLOPS	509.596 MFLOPS

Apple GCC 4.0.1: 1000 x 1000 Matrices

Options	Single-precision	Double-precision
none	59.509 MFLOPS	45.686 MFLOPS
`-O1`	209.747 MFLOPS	85.544 MFLOPS
`-O2`	205.568 MFLOPS	84.948 MFLOPS
`-O3`	204.940 MFLOPS	85.544 MFLOPS

Desktops/Personal Computers

Lenovo 3000 J115 (7387-26U)

Released in late 2006 as one of Lenovo's first entries into the United States market under their own name; a fairly average entry-level PC built around AMD's dual-core Athlon 64 X2 microprocessor with 64+64 KiB shared instruction and data caches, 512 KiB per-core second level cache and a 2 GHz clock frequency (model 3800+). The 3000 J115 employs NVIDIA's nForce 410 chipset, which connects to the system processor through a 1 GHz HyperTransport bus. This system is configured with 1 GiB of DDR2 SDRAM clocked at 266 MHz (for 533 MT/s data rate) and directly interfaced to the Athlon 64 X2's on-die memory controller. All tests are performed under CentOS 7.5.1804 on a system not specifically configured for benchmarking.

All results reflect single-threaded execution. This version of LINPACK does not take any advantage of multi-core processors.

GCC 4.8.5: 100 x 100 matrices

Options	Single-precision	Double-precision
none	155.954 MFLOPS	164.460 MFLOPS
`-O1`	804.030 MFLOPS	952.140 MFLOPS
`-O2`	1.049 GFLOPS	1.005 GFLOPS
`-O3`	1.234 GFLOPS	1.005 GFLOPS
`-Ofast`	1.304 GFLOPS	1.005 GFLOPS

GCC 4.8.5: 1000 x 1000 matrices

Options	Single-precision	Double-precision
none	169.574 MFLOPS	159.683 MFLOPS
`-O1`	588.304 MFLOPS	368.498 MFLOPS
`-O2`	644.872 MFLOPS	374.674 MFLOPS
`-O3`	715.378 MFLOPS	352.982 MFLOPS
`-Ofast`	717.291 MFLOPS	349.305 MFLOPS

Non-Unix Desktops

The following results are from desktops not running a Unix-like operating system, compatibility environment or otherwise lacking the proper accommodations to build the LINPACK sources provided in this package as-is. Source/build tweaks are noted on a per-system basis.

Apple Power Mac G4 (500 DP)

Apple's flagship Power Macintosh through the year 2000, featuring dual Motorola PowerPC 7400 'G4' microprocessors each with two on-die 32 KiB primary caches for instructions and data and 1 MiB off-die unified "backside" secondary caches. Both processors share a single module and directly interface to the 'UniNorth' controller ASIC and 512 MiB of 100 MHz SDRAM through a shared 100 MHz, 64-bit data bus.

All tests are performed under Mac OS 9.0.4 unless noted otherwise.

Metrowerks CodeWarrior 6.0

Level 4 optimizations + AltiVec target, G4-specific instructions, peephole optimization, FMADD & FMSUB instructions, instruction scheduling

n	Single-precision	Double-precision
100	155.606 MFLOPS	165.716 MFLOPS
1,000	91.471 MFLOPS	46.845 MFLOPS

Workstations

HP VISUALIZE C3000 (9000/785/C3000)

A mid-range Unix workstation released in 1999, based on HP's indigenous PA-8500 microprocessor with 1 MiB of on-die data cache, 512 KiB of on-die instruction cache and a clock frequency of 400 MHz. The C3000's microprocessor is interfaced to the "Astro" chipset through a 120 MHz Runway+ bus. The particular system tested had 2,560 megabytes of SDRAM, also running at 120 MHz, and was not specifically configured for benchmarking. All tests are performed under HP-UX 11.11 (11i v1).

GCC 4.2.3: 100 x 100 matrices

Options	Single-precision	Double-precision
none	30.977 MFLOPS	31.991 MFLOPS
`-O1`	183.197 MFLOPS	190.428 MFLOPS
`-O2`	211.587 MFLOPS	215.045 MFLOPS
`-O3`	223.342 MFLOPS	228.996 MFLOPS

GCC 4.2.3: 1000 x 1000 matrices

Options	Single-precision	Double-precision
none	39.779 MFLOPS	34.786 MFLOPS
`-O1`	247.936 MFLOPS	115.235 MFLOPS
`-O2`	294.152 MFLOPS	127.990 MFLOPS
`-O3`	294.152 MFLOPS	127.261 MFLOPS

Note: -Ofast is only available in GCC >=4.7

HP C B.11.11.16: 100 x 100 matrices

Options	Single-precision	Double-precision
none	26.526 MFLOPS	31.407 MFLOPS
`+O1`	31.850 MFLOPS	38.989 MFLOPS
`+O2`	143.009 MFLOPS	143.009 MFLOPS
`+O3`	159.389 MFLOPS	155.954 MFLOPS
`+O4`	144.725 MFLOPS	148.284 MFLOPS
`-fast`	291.785 MFLOPS	235.709 MFLOPS

HP C B.11.11.16: 1000 x 1000 matrices

Options	Single-precision	Double-precision
none	39.731 MFLOPS	40.256 MFLOPS
`+O1`	40.844 MFLOPS	47.363 MFLOPS
`+O2`	349.761 MFLOPS	147.037 MFLOPS
`+O3`	364.493 MFLOPS	148.542 MFLOPS
`+O4`	357.689 MFLOPS	148.707 MFLOPS
`-fast`	396.844 MFLOPS	150.542 MFLOPS

Note: HP C +O2 is roughly equivalent to GCC -O1

Non-Unix Workstations

The following results are from workstations not running a Unix-like operating system, compatibility environment or otherwise lacking the proper accommodations to build the LINPACK sources provided in this package as-is. Source/build tweaks are noted on a per-system basis.

HP Vectra XU 6/200

The Vectra XU 6/200 was first released in early 1996, succeeding the XU 6/150 as the pinnacle of the Vectra line and HP's PC-compatible systems overall, built around one or two Intel Pentium Pro microprocessors with 8+8 KiB instruction and data caches, 256 or 512 KiB of off-die, on-package full-speed secondary cache and clock frequencies of 200 MHz. The system processors are interfaced to Intel's 440FX "Natoma" chipset through a 66 MHz front-side bus.

The particular example tested was equipped with dual 200 MHz Pentium Pro microprocessors with 512 KiB secondary caches and 128 MiB of memory running at 66 MHz. This system was freshly configured, but not specifically customized for optimal benchmark performance. All tests are performed under Windows NT 4.0, service pack 6.

Microsoft Visual C++ 5.0: 100 x 100 matrices

Options	Single-precision	Double-precision
No optimizations	20.012 MFLOPS	18.091 MFLOPS
Full optimization, 80486 code	59.081 MFLOPS	48.319 MFLOPS
Full optimization, Pentium code	62.270 MFLOPS	45.227 MFLOPS
Full optimization, Pentium Pro code	65.737 MFLOPS	48.319 MFLOPS
Full optimization, blended code	67.138 MFLOPS	48.319 MFLOPS

Microsoft Visual C++ 5.0: 1000 x 1000 matrices

Options	Single-precision	Double-precision
No optimizations	19.404 MFLOPS	11.989 MFLOPS
Full optimization, 80486 code	31.472 MFLOPS	13.396 MFLOPS
Full optimization, Pentium code	31.558 MFLOPS	13.413 MFLOPS
Full optimization, Pentium Pro code	31.561 MFLOPS	13.413 MFLOPS
Full optimization, blended code	31.469 MFLOPS	13.296 MFLOPS

Build notes: No modifications to the LINPACK sources are required for Visual C++ 5.0, but build configurations were used instead of the standard makefile. These configurations were derived from the default Win32 Debug configuration with optimizations added as needed.

Servers

Sun Fire T1000

The Sun Fire T1000 is an entry-level 1U rackmounted server released in early 2006 as one of the first systems to use Sun's radically multi-threaded UltraSPARC T1 "Niagra" microprocessor, derived from a SPARC implementation originally developed by Afara Websystems that features four, six or eight relatively simple SPARC V9 cores with individual 16 KiB instruction caches and 8 KiB data caches, a shared 3 MiB secondary cache and a single floating-point unit shared among all cores. Each core also has four threads, all sharing a single pipeline and a massive register file composed of 640 64-bit registers that allows for a thread's state to be quickly saved and resumed in a single cycle in order to maximize processor utilization in heavily multi-threaded workloads.

The 8-core T1 utilized in this T1000 is clocked at 1 GHz, and is directly interfaced to 16 GiB of error-correcting 533 MT/s DDR2 memory through two on-die memory controllers with 128-bit data buses. The T1 possesses a total of four on-die memory controllers, however the T1000 only utilizes two of them to support two banks of four memory modules each. The T1000 is capable of supporting modules of up to 4 GiB in size.

All tests are performed on a T1000 with an 8-core UltraSPARC T1 running Solaris 10 10/09 with no specific configuration for benchmarking purposes.

Although the ANSIbench-packaged LINPACK sources do not support multi-threading, it probably doesn't make much difference due to the T1 only having a single floating-point unit shared among all eight cores.

GCC 5.5.0: 100 x 100 matrices

Options	Single-precision	Double-precision
none	13.460 MFLOPS	12.633 MFLOPS
`-O1`	21.955 MFLOPS	21.744 MFLOPS
`-O2`	22.170 MFLOPS	24.513 MFLOPS
`-O3`	21.862 MFLOPS	20.510 MFLOPS
`-Ofast`	22.170 MFLOPS	20.558 MFLOPS

GCC 5.5.0: 1000 x 1000 matrices

Options	Single-precision	Double-precision
none	13.937 MFLOPS	13.233 MFLOPS
`-O1`	22.004 MFLOPS	20.422 MFLOPS
`-O2`	22.047 MFLOPS	20.397 MFLOPS
`-O3`	21.903 MFLOPS	20.262 MFLOPS
`-Ofast`	21.917 MFLOPS	20.299 MFLOPS

Sun Studio 12/Sun C 5.9: 100 x 100 matrices

Options	Single-precision	Double-precision
none	18.237 MFLOPS	14.684 MFLOPS
`-xO1`	12.848 MFLOPS	12.563 MFLOPS
`-xO2`	21.744 MFLOPS	21.955 MFLOPS
`-xO3`	22.389 MFLOPS	19.924 MFLOPS
`-xO4`	21.743 MFLOPS	20.558 MFLOPS
`-xO5`	22.170 MFLOPS	20.465 MFLOPS
`-fast`	21.955 MFLOPS	21.134 MFLOPS

Sun Studio 12/Sun C 5.9: 1000 x 1000 matrices

Options	Single-precision	Double-precision
none	19.587 MFLOPS	14.669 MFLOPS
`-xO1`	14.245 MFLOPS	13.500 MFLOPS
`-xO2`	21.775 MFLOPS	20.201 MFLOPS
`-xO3`	21.803 MFLOPS	20.056 MFLOPS
`-xO4`	21.860 MFLOPS	20.176 MFLOPS
`-xO5`	21.860 MFLOPS	20.176 MFLOPS
`-fast`	22.208 MFLOPS	20.460 MFLOPS

LINPACK results

Embedded Systems

HP t5325

GCC 4.2.4: 100 x 100 matrices

GCC 4.2.4: 1000 x 1000 matrices

Laptops and Portables

Panasonic ToughBook U1

GCC 5.4.0: 100 x 100 matrices

GCC 5.4.0: 1000 x 1000 matrices

Dell Latitude E6420

GCC 4.8.5: 100 x 100 matrices

GCC 4.8.5: 1000 x 1000 matrices

Apple iBook G4 (Mid-2005/1.33)

Apple GCC 4.0.1: 100 x 100 Matrices

Apple GCC 4.0.1: 1000 x 1000 Matrices

Desktops/Personal Computers

Lenovo 3000 J115 (7387-26U)

GCC 4.8.5: 100 x 100 matrices

GCC 4.8.5: 1000 x 1000 matrices

Non-Unix Desktops

Apple Power Mac G4 (500 DP)

Metrowerks CodeWarrior 6.0

Level 4 optimizations + AltiVec target, G4-specific instructions, peephole optimization, FMADD & FMSUB instructions, instruction scheduling

Workstations

HP VISUALIZE C3000 (9000/785/C3000)

GCC 4.2.3: 100 x 100 matrices

GCC 4.2.3: 1000 x 1000 matrices

HP C B.11.11.16: 100 x 100 matrices

HP C B.11.11.16: 1000 x 1000 matrices

Non-Unix Workstations

HP Vectra XU 6/200

Microsoft Visual C++ 5.0: 100 x 100 matrices

Microsoft Visual C++ 5.0: 1000 x 1000 matrices

Servers

Sun Fire T1000

GCC 5.5.0: 100 x 100 matrices

GCC 5.5.0: 1000 x 1000 matrices

Sun Studio 12/Sun C 5.9: 100 x 100 matrices

Sun Studio 12/Sun C 5.9: 1000 x 1000 matrices

Clone this wiki locally