-
Notifications
You must be signed in to change notification settings - Fork 1
/
README
210 lines (152 loc) · 7.35 KB
/
README
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
This file describes the experimental framework used in the technical report
"Sorting in the Presence of Branch Prediction and Caches", by Paul Biggar and
David Gregg, and the paper "An Experimental Study of Sorting and Branch
Prediction", by Paul Biggar, Nicholas Nash, Kevin Williams and David Gregg.
NB: Most of the scripts and programs used here are hard coded, and will only
work from the working directories specified below.
There are 4 sets of tests: testing the sorts to see if they work, hardware
performance counters, software branch predictors and simplescalar tests.
Most of the results are included in this package in their rawest form. It is
necessary to generate some graphs and process some results. Commands which must
be run to be able to create the docs are marked with a * below.
main.c:
Enter the code directory. In main.c there are a list of sorts to be tested
(look for 'test_sort'). The #defines TEST_MIN and TEST_MAX control how many
keys are to be tested. The sorts are tested for an array of every size
between the min and max, also using that size as a seed for the random
number generator.
The #define RANDOM_SIZE controls time_sort() which gives a quick tick count
or how long it takes to run a sort. Not very official, at all. It also
defines the size of array used for the software predictors, which is
official.
The #define RUN_VISUAL decides if the visual tests are being done. These
were used in development, and aren't that interesting.
Hardware counters:
In the code/ directory, call do_all_papiex or do_all_perfex. You need a
patched kernel for this. See: http://icl.cs.utk.edu/papi/.
Capture the output. Ctrl-C when bored, or needing results, as this will
take a long time (3 days or so on Pentium 4 1.8Ghz):
$ ./do_all_papiex > papiex_results
or
$ ./do_all_perfex > perfex_results
If you use papiex (more accurate), then it needs to be converted to the
other format:
$ ../scripts/papi_convert papiex_results > perfex_results
Then process the results to give you graphs:
$ ../scripts/process_perfex_output perfex_results
There's a little bit of hard coding in this script, so change $key_num to
the number of times perfex actually ran. Also modify the input so that only
completed work cycles are there. This means if the last sort you ran was
selectsort, and it ran 256 times, but other sorts ran 257 times, remove
the sort results for the 257th iteration from the results file.
Additionally, this requires results from 'do_nothing', or else the output
will be empty.
This produces a file in the data/ directory called processed_cycles_data
which is in gnuplot format. If you add more sorts to the framework, you'll
need to edit the gnuplot scripts in docs/*/scripts/ with the
column numbers that are printed out by process_perfex_output.
To create the gnuplot images, enter the data/perfex_gnuplot_scripts/
directory and:
* $ gnuplot *
The images appear in the data/plots/ directory.
Software branch predictors:
To run the branch predictors, enter the code/ directory, and ensure that
-D_USE_SOFTWARE_PREDICTOR is set in the CFLAGS line in the Makefile. Then:
$ make
$ ./fastsort
The software counters will appear in data/counters/. To process these,
from the data/ directory, run (after unbzipping some files):
* $ ../scripts/process_counter counters/*
Warnings from process_counter probably mean that the .bz2 files weren't
unzipped.
This will print out the maximum branches. If this isn't slightly lower than
the y-axis in the gnuplot scripts, it will need to be changed (On line 8 of
process_counter, then rerun it).
This creates the gnuplot data files in data/counter_gnuplot_data and
gnuplot scripts in the data/counter_gnuplot_generated_scripts/ directory.
To create the images, enter the data/ directory and run:
* $ for i in counter_gnuplot_generated_scripts/*; do gnuplot $i; done
The images are put into the data/plots directory.
Simplescalar:
If your simplescalar binaries are big-endian (if gcc is
ssbig-na-sstrix-gcc), then edit the $CC variable in scripts/do_simple to
reflect this.
In the code/ directory, run:
$ ./do_all_simple
This will run do_simple 10 times per sort, using data/BIG as it's input.
Simplescalar's output is captured by the scripts, edited and output into a
more human readable format, in data/simple_logs. To convert this to gnuplot
format, run:
* $ ../scripts/do_all_convert
To create the gnuplot scripts for these files, run:
* $ ../scripts/do_all_plot
Though this doesn't actually run gnuplot, despite its name. do_all_convert
creates gnuplot data files in data/converted_simple_logs/, and do_all_plot
creates gnuplot scripts for each sort type and each sort, stored in
data/simplescalar_generated_gnuplot_scripts.
To create the images, enter the data/ directory and run:
* $ gnuplot simplescalar_generated_gnuplot_scripts/*
Documents:
The are three documents in docs/. tech_report/ is the technical report
incorporating everything you see here. branch_paper/ is the first paper,
mostly discussing branch prediction.
The create the papers, run:
* $ make again
in the respective directories. All the papers use plots from the
data/plots/ directory, and have sym-links to it. Some of the plots might be
slightly and subtly different, so it's best to enter the /docs/*/scripts
directory first, and run:
* $ ./gnuplot_dir
Running make will not check a dependency of the plots, so it will be necessary to:
* $ make again
or
* $ make clean
* $ make
to resolve this.
All the macros used to present the graphics are in report.tex or paper.tex.
The code is in docs/*/code/ and has been edited for readability from that
which is in code/.
Some extra images will need to be created for the technical report. From
data/:
* $ ../scripts/process_two_point_counter counters/bubblesort
* $ gnuplot counter_gnuplot_generated_scripts/two_point_counter_bubblesort
* $ ../scripts/process_two_point_counter counters/improved\ bubblesort
* $ gnuplot \
counter_gnuplot_generated_scripts/two_point_counter_improved_bubblesort
All these tests were done on a Pentium 4, which is little-endian, and using
a little-endian version of simple-scalar. If your machine is big-endian,
you'll need to take some steps to get the same results.
Tool versions:
The following are the versions of the tools used in the creation of this
report.
- Valgrind: valgrind-2.4.0
- PapiEx: 0.99rc1, with perfctl patch 2.6.8-rc2, using papi 3.0.8.1, on
Linux 2.6.8.1
- gcc: gcc (GCC) 3.3.5 (Debian 1:3.3.5-13)
- perl: v5.8.7 built for i386-linux-thread-multi
- gnuplot: gnuplot 4.0 patchlevel 0
- SimpleScalar: SimpleScalar/PISA Tool Set version 3.0 of August, 2003.
- sslittle-na-sstrix-gcc: gcc version 2.6.3
- host: i386-redhat-linux
- Pentium 4:
$ cat /proc/cpuinfo
processor : 0
vendor_id : GenuineIntel
cpu family : 15
model : 1
model name : Intel(R) Pentium(R) 4 CPU 1.60GHz
stepping : 2
cpu MHz : 1595.321
cache size : 256 KB
fdiv_bug : no
hlt_bug : no
f00f_bug : no
coma_bug : no
fpu : yes
fpu_exception : yes
cpuid level : 2
wp : yes
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr
pge mca cmov pat pse36 clflush dts acpi mmx fxsr
sse sse2 ss tm
bogomips : 3185.04