Benchmarking OpenRISC 1200


2012-03-27Publicerad av Sven-Åke Andersson

Introduction


In computing, a benchmark is the act of running a computer program, a set of programs, or other operations, in order to assess the relative performance of an object, normally by running a number of standard tests and trials against it. The term 'benchmark' is also mostly utilized for the purposes of elaborately-designed benchmarking programs themselves.

Benchmarking is usually associated with assessing performance characteristics of computer hardware, for example, the floating point operation performance of a CPU, but there are circumstances when the technique is also applicable to software. Software benchmarks are, for example, run against compilers or database management systems.

CPU core benchmarking


Although it doesn’t reflect how you would use a processor in a real application, sometimes it’s important to isolate the CPU’s core from the other elements of the processor and focus on one key element. For example, you might want to have the ability to ignore memory and I/O effects and focus primarily on the pipeline operation. This is CoreMark’s domain. CoreMark is capable of testing a processor’s basic pipeline structure, as well as the ability to test basic read/write operations, integer operations, and control operations. Read more.


CoreMark


CoreMark is a benchmark that aims to measure the performance of central processing units (CPU) used in embedded systems. It was developed in 2009 by Shay Gal-On at EEMBC and is intended to become an industry standard, replacing the antiquated Dhrystone benchmark. The code is written in C code and contains implementations of the following algorithms: list processing (find and sort), Matrix (mathematics) manipulation (common matrix operations), state machine (determine if an input stream contains valid numbers), and CRC. Read more.

 

Downloading CoreMark
 

The test suite can be downloaded from www.coremark.org

 

After downloading and unpacking we have the following directory structure.
 



We will add two port directories called or1k and atlys. In these directories we put three files modified for our design. The files in the or1k directory will be used when compiling CoreMark for running in the simulator and the files in the atlys directory will be used when compiling for the Atlys board.
 

Compiler optimization


Without any optimization option, the compiler's goal is to reduce the cost of compilation and to make debugging produce the expected results. Statements are independent: if you stop the program with a breakpoint between statements, you can then assign a new value to any variable or change the program counter to any other statement in the function and get exactly the results you would expect from the source code.
Turning on optimization flags makes the compiler attempt to improve the performance and/or code size at the expense of compilation time and possibly the ability to debug the program. The compiler performs optimization based on the knowledge it has of the program. Compiling multiple files at once to a single output file mode allows the compiler to use information gained from all of the files when compiling each of them. Here is a link to a page describing possible optimization options. Depending on the compiler options we choose the benchmark result may vary. When comparing different processors it is important we use the same compiler options to get reliable results.
 

The port directory


The port directory contains three files that are modified for the processor we are going to benchmark:

core_portme.c
core_portme.h
core_portme.mak


New GNU toolchain
 

The first thing I had to do was asking Julius for an updated GNU toolchain. The 1.0rc1 precompiled version I had didn't let me compile and run the CoreMark benchmark. Julius compiled a new toolchain from the OpenCores SVN repository revision 789. He promised to put it on the OpenCores FTP site. Here is a link to download the latest version.
 


Compiling for the simulator


The following commands are used to compile the benchmark for running in the OR1K simulator:

cd ..../coremark
make PORT_DIR=or1k ITERATIONS=2000

When changing number of iterations use the following command:

make PORT_DIR=or1k ITERATIONS=4000 REBUILD=1



Use this command to start the simulator:

or32-elf-sim -m8M coremark.exe

 


 


Compiling for the board


The following commands are used to compile the benchmark for running on the Atlys board:

cd ..../coremark
make PORT_DIR=atlys ITERATIONS=2000


Create a bare metal boot image


To create a u-boot image from a baremetal program in bin format, the u-boot tool <mkimage> is used. It is available in u-boot's tools/ directory and the following command can be used to create a not compressed bare metal image called 'coremark' with load address 0 and entry point at 0x100:

or32-elf-objcopy -O binary coremark.exe coremark.bin
mkimage -A or1k -T standalone -C none -a 0 -e 0x100 -n coremark -d coremark.bin /tftpboot/coremark.ub



Benchmark conditions


Here are the conditions during the benchmark.
 

Condition Value
Development board Digilent Atlys Xilinx University Program
FPGA Xilinx Spartan-6 XC6LX45CSG324C
Processor clock 50 MHz
Instruction cache 32 KB
Data cache 32 KB
MMU Yes
Hardware multiply Yes
Hardware divide Yes
Floating point Single precision

Running on the board
 

Here is the result from running CoreMark on the Atlys board. Observe that no compiler optimization (except for -O2) has been used.

tftp coremark.ub
bootm 100000



 

 

This gives a CoreMark value of 63.411/50 = 1.27/MHz. We will try to improve this value by adding some compiler options, compile the program and rerun the test.

Optimization experiments


Here are the results from trying to optimize the compilation phase. Without any optimization at all :
0.35 CoreMark/MHz

 
Compiler Option -O2 -O3
No extra 1.27 1.31
-mhard-div -mhard-mul 1.27 1.31
-funroll-loops 1.39 1.36
-fgcse-sm 1.28 1.32
-msoft-float 1.27 1.31
-funroll-all-loops 1.41 1.38
All 1.41 1.38
 


Memory system benchmark


The CoreMark benchmark is setup to mainly test the processor part of our system. Stefan Kristiansson has written a testbench to test the efficiency of the memory system which can be downloaded from his GIT repository using the following command:

git clone git://git.chokladfabriken.org/membenchmark

After downloading the files (main.c and makefile) we use the following command to compile and make a bin file for loading into our system.

make BOARD=atlys

Here is the result from running the program on our Atlys board.