By Rainer Wichmann rainer@
la-samhna.de (last update: Dec 29, 2011)Benchmarking procedure
This work compares the speed of code generated by various versions of the GCC (GNU Compiler Collection) C compiler, ranging from 3.3.6 to 4.6.0. Also included are results for several versions (ranging from 2.3 to 3.0) of LLVM. For versions below 3.0, the llvm-gcc-4.2 C frontend was used while llvm 3.0 has been tested with the clang C frontend.
All compilers have been built from original source, thus there are no distribution-specific patches. Gcc builds are bootstrap builds, i.e. the compiler has been built under itself.
All benchmark programs were compiled with '-O3 -std=gnu99 -ffast-math'. For each benchmark/compiler combination, 30 executions were averaged.
The benchmarks used are a set of small benchmarking programs included in the Acovea optimization tool by Scott Robert Ladd. Basically, each of these programs benchmarks a single algorithm. Additionally a benchmark for the TIGER192 hash algorithm has been added; since most hash algorithms exercise similar (mostly bit shuffling) functions, this is probably indicative of the compiler's performance for cryptographic hash functions.
- almabench
- This benchmark calculates the daily ephemeris (at noon) for the years 2000-2099; results depend heavily on efficient floating-point math and functions such as sin, cos, and sqrt.
- evobench
- This benchmark uses a genetic algorithm to minimize a 2D function.
- huffbench
- This program implements the Huffman compression algorithm.
- linbench
- Uses LUP-decomposition to solve a large
linear equation.
- mat1bench
- Nothing special here -- just a brute-force matrix multiply.
- tigerbench
- The tiger192 cryptographic hash algorithm -- 32 bit reference implementation (not part of the benchmark suite by S. R. Ladd).
- treebench
- A tree-generating benchmark.
Results
Results are shown in the following graphs. The vertical axis is the execution time of the benchmark: lower is better, faster, more sexy.
data:image/s3,"s3://crabby-images/0a110/0a11008bbbd7fb2dad7cb38938c296b031e7d8ea" alt="almabench"
data:image/s3,"s3://crabby-images/c1a5b/c1a5b13d210ad87ade1be158b8268fe66794e4a0" alt="evobench"
data:image/s3,"s3://crabby-images/96245/96245e697684c574ba1d73a2d9009957cda35495" alt="huffbench"
data:image/s3,"s3://crabby-images/65e9a/65e9a54394f03a45d110cbf668ba5c9d7e21caaa" alt="linbench"
data:image/s3,"s3://crabby-images/952a8/952a82c0535be1901dfd8ef0ffab03cf37eafec1" alt="mat1bench"
data:image/s3,"s3://crabby-images/d83a9/d83a92bb63d3a49682660a69d4f13e195bfe305c" alt="tigerbench"
data:image/s3,"s3://crabby-images/5452b/5452b667ee433f86e85eee161af7dd054f39e1f1" alt="treebench"
Conclusions
It is interesting to see that the last version of the 3.x gcc series (3.4.6) still rivals or even outperforms the latest gcc 4.x compiler (4.6.0), except for 2/7 benchmarks: almabench shows a clear improvement with the start of the 4.x series, and huffbench improved a lot starting with 4.5.0. Other benchmarks have their ups and downs during the 4.x series, but there is usually no sign of a consistent trend (actually, evobench shows a clear performance drop starting at 4.5.0 and remaining constant).
Also, the LLVM compiler clearly has improved a lot over the last few years (though 3.0 shows a few remakable regressions). LLVM 3.0 produces code that is faster or equally fast than gcc 4.6.0 code in 4/7 benchmarks.
One may ask how, or whether, this result applies to larger programs. However, it is often the case that a large program spends almost all time in a small and computationally expensive part of the code. Which implies that it is of crucial importance that the compiler is able to deliver good code for a vast range of different algorithms.
data:image/s3,"s3://crabby-images/66705/66705f25831b5641015be5c6b1cf022ff46ea76a" alt="Creative Commons License"
This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 2.0 Germany License.