README

This file describes the experimental framework used in the technical report
"Sorting in the Presence of Branch Prediction and Caches", by Paul Biggar and
David Gregg, and the paper "An Experimental Study of Sorting and Branch
Prediction", by Paul Biggar, Nicholas Nash, Kevin Williams and David Gregg.


NB: Most of the scripts and programs used here are hard coded, and will only
work from the working directories specified below.

There are 4 sets of tests: testing the sorts to see if they work, hardware
performance counters, software branch predictors and simplescalar tests.

Most of the results are included in this package in their rawest form. It is
necessary to generate some graphs and process some results. Commands which must
be run to be able to create the docs are marked with a * below.


main.c:

	Enter the code directory. In main.c there are a list of sorts to be tested
	(look for 'test_sort'). The #defines TEST_MIN and TEST_MAX control how many
	keys are to be tested. The sorts are tested for an array of every size
	between the min and max, also using that size as a seed for the random
	number generator.

	The #define RANDOM_SIZE controls time_sort() which gives a quick tick count
	or how long it takes to run a sort. Not very official, at all. It also
	defines the size of array used for the software predictors, which is
	official.

	The #define RUN_VISUAL decides if the visual tests are being done. These
	were used in development, and aren't that interesting.

Hardware counters:

	In the code/ directory, call do_all_papiex or do_all_perfex. You need a
	patched kernel for this. See: http://icl.cs.utk.edu/papi/.

	Capture the output. Ctrl-C when bored, or needing results, as this will
	take a long time (3 days or so on Pentium 4 1.8Ghz):
		$ ./do_all_papiex > papiex_results
	or
		$ ./do_all_perfex > perfex_results

	If you use papiex (more accurate), then it needs to be converted to the
	other format:
		$ ../scripts/papi_convert papiex_results > perfex_results

	Then process the results to give you graphs:
		$ ../scripts/process_perfex_output perfex_results

	There's a little bit of hard coding in this script, so change $key_num to
	the number of times perfex actually ran. Also modify the input so that only
	completed work cycles are there. This means if the last sort you ran was
	selectsort, and it ran 256 times, but other sorts ran 257 times, remove
	the sort results for the 257th iteration from the results file.

	Additionally, this requires results from 'do_nothing', or else the output
	will be empty.

	This produces a file in the data/ directory called processed_cycles_data
	which is in gnuplot format. If you add more sorts to the framework, you'll
	need to edit the gnuplot scripts in docs/*/scripts/ with the
	column numbers that are printed out by process_perfex_output.

	To create the gnuplot images, enter the data/perfex_gnuplot_scripts/
	directory and:
*		$ gnuplot *

	The images appear in the data/plots/ directory.

Software branch predictors:

	To run the branch predictors, enter the code/ directory, and ensure that
	-D_USE_SOFTWARE_PREDICTOR is set in the CFLAGS line in the Makefile. Then:
		$ make
		$ ./fastsort

	
	The software counters will appear in data/counters/. To process these,
	from the data/ directory, run (after unbzipping some files):
*		$ ../scripts/process_counter counters/*

	Warnings from process_counter probably mean that the .bz2 files weren't
	unzipped.

	This will print out the maximum branches. If this isn't slightly lower than
	the y-axis in the gnuplot scripts, it will need to be changed (On line 8 of
	process_counter, then rerun it).

	This creates the gnuplot data files in data/counter_gnuplot_data and
	gnuplot scripts in the data/counter_gnuplot_generated_scripts/ directory.
	To create the images, enter the data/ directory and run:
*		$ for i in counter_gnuplot_generated_scripts/*; do gnuplot $i; done

	The images are put into the data/plots directory.
		

Simplescalar:

	If your simplescalar binaries are big-endian (if gcc is
	ssbig-na-sstrix-gcc), then edit the $CC variable in scripts/do_simple to
	reflect this.

	In the code/ directory, run:
		$ ./do_all_simple

	This will run do_simple 10 times per sort, using data/BIG as it's input.
	Simplescalar's output is captured by the scripts, edited and output into a
	more human readable format, in data/simple_logs. To convert this to gnuplot
	format, run:
*		$ ../scripts/do_all_convert

	To create the gnuplot scripts for these files, run:
*		$ ../scripts/do_all_plot

	Though this doesn't actually run gnuplot, despite its name.  do_all_convert
	creates gnuplot data files in data/converted_simple_logs/, and do_all_plot
	creates gnuplot scripts for each sort type and each sort, stored in
	data/simplescalar_generated_gnuplot_scripts.

	To create the images, enter the data/ directory and run:
*		$ gnuplot simplescalar_generated_gnuplot_scripts/*


Documents:

	The are three documents in docs/. tech_report/ is the technical report
	incorporating everything you see here. branch_paper/ is the first paper,
	mostly discussing branch prediction.

	The create the papers, run:
*		$ make again

	in the respective directories. All the papers use plots from the
	data/plots/ directory, and have sym-links to it. Some of the plots might be
	slightly and subtly different, so it's best to enter the /docs/*/scripts
	directory first, and run:
*		$ ./gnuplot_dir

	Running make will not check a dependency of the plots, so it will be necessary to:
*		$ make again
		or
*		$ make clean
*		$ make
	
	to resolve this.

	All the macros used to present the graphics are in report.tex or paper.tex.
	The code is in docs/*/code/ and has been edited for readability from that
	which is in code/.

	Some extra images will need to be created for the technical report. From
	data/:
*		$ ../scripts/process_two_point_counter counters/bubblesort
*		$ gnuplot counter_gnuplot_generated_scripts/two_point_counter_bubblesort
*		$ ../scripts/process_two_point_counter counters/improved\ bubblesort
*		$ gnuplot \
		  counter_gnuplot_generated_scripts/two_point_counter_improved_bubblesort

	All these tests were done on a Pentium 4, which is little-endian, and using
	a little-endian version of simple-scalar. If your machine is big-endian,
	you'll need to take some steps to get the same results.


Tool versions:

	The following are the versions of the tools used in the creation of this
	report.

		- Valgrind: valgrind-2.4.0

		- PapiEx: 0.99rc1, with perfctl patch 2.6.8-rc2, using papi 3.0.8.1, on
		  Linux 2.6.8.1
		
		- gcc: gcc (GCC) 3.3.5 (Debian 1:3.3.5-13)
		
		- perl: v5.8.7 built for i386-linux-thread-multi
		
		- gnuplot: gnuplot 4.0 patchlevel 0
		
		- SimpleScalar: SimpleScalar/PISA Tool Set version 3.0 of August, 2003.
			- sslittle-na-sstrix-gcc: gcc version 2.6.3
			- host: i386-redhat-linux

		- Pentium 4: 

			$ cat /proc/cpuinfo
			processor       : 0
			vendor_id       : GenuineIntel
			cpu family      : 15
			model           : 1
			model name      : Intel(R) Pentium(R) 4 CPU 1.60GHz
			stepping        : 2
			cpu MHz         : 1595.321
			cache size      : 256 KB
			fdiv_bug        : no
			hlt_bug         : no
			f00f_bug        : no
			coma_bug        : no
			fpu             : yes
			fpu_exception   : yes
			cpuid level     : 2
			wp              : yes
			flags           : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr
							  pge mca cmov pat pse36 clflush dts acpi mmx fxsr
							  sse sse2 ss tm
			bogomips        : 3185.04