forked from icl-utk-edu/papi
-
Notifications
You must be signed in to change notification settings - Fork 0
/
ChangeLogP421.txt
1103 lines (810 loc) · 41.4 KB
/
ChangeLogP421.txt
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
2012-02-13
* src/components/net/linux-net.c: Repairing more coverity warnings.
2012-02-11
* src/windows-common.c: Missed an instance of CPUs yesterday.
* src/: papi_internal.c, threads.c: This changes fixes two race
conditions that are probably the cause of the pthrtough
double-free error.
When freeing a thread, we remove and free all eventsets belonging
to that thread. This could race with the thread itself removing
the evenset, causing some ESI fields to be freed twice.
The problem was found by using the Valgrind 3.8 Helgrind tool
valgrind --tool=helgrind --free-is-write=yes ctests/pthrtough
In order for Helgrind to work, I had to temporarily modify PAPI
to use POSIX pthread mutexes for locking. Is there any reason we
don't use these all the time?
2012-02-10
* src/utils/: avail.c, component.c, event_chooser.c,
native_avail.c: ix one more case of "CPU's" in the print header
code.
Also remove the extraneous The following correspond to fields
in the PAPI_event_info_t structure. message
* src/: testlib/papi_test.h, testlib/test_utils.c,
ctests/all_native_events.c, ctests/calibrate.c,
ctests/code2name.c, ctests/hwinfo.c: Fix one more case of "CPU's"
in the print header code.
Also remove the extraneous The following correspond to fields
in the PAPI_event_info_t structure. message
* src/buildbot_configure_with_components.sh: take infiniband out of
the buildbot test.
* src/: x86_cache_info.c, components/coretemp/linux-coretemp.c,
components/lmsensors/linux-lmsensors.c,
components/lustre/linux-lustre.c, components/net/linux-net.c,
utils/event_chooser.c: Fix coverity errors reported by Will
Cohen.
* src/: aix.c, any-proc-null.c, linux-common.c, papi.c, papi.h,
papivi.h, solaris-niagara2.c, solaris-ultra.c,
ctests/clockres_pthreads.c: Address Redhat bug 785975. The
plural of CPU appears to be CPUs
* src/Makefile.inc: Patch to cleanup dependencies, allowing for
parallel makes. Patch due to Will Cohen from redhat
2012-02-09
* src/buildbot_configure_with_components.sh: Add infiniband and mx
component to buildbot component tests.
* src/components/net/tests/: net_values_by_code.c,
net_values_by_name.c: Apply patch suggested by Will Cohen to
check for system return values.
* src/components/lmsensors/linux-lmsensors.h: Added missing string
header
2012-02-08
* man/... : update man pages one more time for 4.2.1
release
* release_procedure.txt: Make sure generated html has papi group
id.
2012-02-07
* src/multiplex.c: Fix the @file matching multiple files warning.
* src/components/README: Cleanup doxygen errors.
* doc/Doxyfile-html: Typo introduced by the last commit.
* doc/Doxyfile-html: Exclude linux-bgp.c from doxygen.
* doc/Doxyfile-html: Make sure the component README file gets
included in doxygen.
* src/components/coretemp_freebsd/coretemp_freebsd.c: Cleanup
doxygen warnings in freebsd coretemp component.
* src/papi.h: Cleanup some doxygen warnings related to the
groupings.
* src/components/example/example.c: fix doxygen warning in the
example component
* doc/Doxyfile-html: Remove some cruft from doxygen config file.
This addresses the warning about dot not found at /sw/bin/dot .
* src/components/: infiniband/linux-infiniband.c,
infiniband/linux-infiniband.h, cuda/linux-cuda.c,
cuda/linux-cuda.h: Cleaned up some doxygen issues
* src/components/lmsensors/linux-lmsensors.c: Removed long
forgotten debug outputs
* src/papi_libpfm4_events.c: Fix minor doxygen typos.
* src/components/vmware/vmware.c: Add params for doxygen
* man/... : update man pages
2012-02-06
* doc/Doxyfile-man1: Fix a typo in a doxygen config file.
2012-02-03
* release_procedure.txt, doc/Doxyfile, doc/Doxyfile-everything,
doc/Doxyfile-html, doc/Doxyfile.utils, doc/Doxyfile-man1,
doc/Doxyfile-man3, doc/Makefile, doc/doxygen_procedure.txt:
Rework the doxygen configuration files.
* RELEASENOTES.txt: Update for the impending release.
* ChangeLogP421.txt, RELEASENOTES.txt: Updates for the impending
release.
2012-02-02
* src/: papi.c, papi.h: Minor tweaks for doxygen errors
2012-02-01
* src/components/lmsensors/: Rules.lmsensors, configure.in: Fixed
configure error message and rules link error for shared object
linking. Thanks Will Cohen.
* src/components/appio/Rules.appio: Correct pathing
* src/ctests/api.c: One minor tiny fix to check for PAPI_ENOEVNT
when testing PAPI_flops. If PAPI_FP_OPS does not exist on the
processor (like many of em), then this tests fails.
2012-01-31
* src/ctests/multiattach.c: Increase acceptance criteria for
cycles.
* src/Makefile.in, src/configure, src/configure.in, src/papi.h,
doc/Doxyfile, doc/Doxyfile-everything, doc/Doxyfile.utils,
papi.spec: Update version number to 4.2.1 in preparation for
release.
* src/ctests/prof_utils.c: Correct a warning on 32bit builds about
casting caddr_t to (long long)
Specifically: prof_utils.c:234: warning: cast from pointer to
integer of different size prof_utils.c:248: warning: cast from
pointer to integer of different size prof_utils.c:262: warning:
cast from pointer to integer of different size
We first cast to unsigned long and then on to long long. ( This
maybe overkill, but its for a printf format string )
2012-01-30
* release_procedure.txt: Add the correct path for doxygen on ICL
machines.
* src/papi_events.csv: Modify Intel Sandybridge PAPI_FP_OPS and
PAPI_FP_INS events to not count x87 fp instructions.
The problem is that the current predefines were made by adding 5
events. With the NMI watchdog stealing an event and/or
hyperthreading reducing the numbr of available counters by half,
we just couldn't fit.
This now raises the potential for people using x87-compiled
floating point on Sandybridge and getting 0 FP_OPS. This is only
likely if running a 32-bit kernel and *not* compiling your code
with -msse.
A long-term solution might be trying to find a better set of FP
predefines for sandybridge.
* src/components/: lustre/linux-lustre.c, mx/linux-mx.c: Some
really minor cleanups to the lustre and mx components.
2012-01-28
* src/components/example/: example.c, tests/example_basic.c: Update
example component
Cleans up code, adds some more documentation, adds counter write
support.
2012-01-27
* src/papi_user_events.c: Minor cleanups for user events.
* src/libpfm4/: README, include/perfmon/pfmlib.h, lib/Makefile,
lib/pfmlib_amd64.c, lib/pfmlib_common.c, lib/pfmlib_priv.h: Fix
"conflicts" in git import of libpfm4.
* src/libpfm4/lib/: pfmlib_amd64_fam11h.c,
events/amd64_events_fam11h.h: Initial revision
2012-01-26
* src/papi_fwrappers.c: Escape the include directives in the
documentation.
(Cleans up doxygen )
* src/components/README: Adding vmware to component README
* src/components/vmware/: Makefile.vmware.in,
PAPI-VMwareComponentDocument.pdf, Rules.vmware,
VMwareComponentDocument.txt, configure, configure.in, vmware.c,
vmware.h: merge vmware branch to head
* src/perf_events.c: Set fast_counter_read back to 0 on x86/x86_64
perf_events, as currently rdpmc counter access is not supported.
There are patches floating around that enable this (although
performance is still a long way from perfctr) but they will not
likely be merged for a while now, and the perf_events substrate
will require a lot of extra code to support it once it does make
it into a shipping kernel.
* src/buildbot_configure_with_components.sh: Remove acpi from the
buildbot configure script.
2012-01-25
* src/components/mx/: Makefile.mx.in, Rules.mx, configure,
configure.in, linux-mx.c, linux-mx.h, tests/Makefile,
tests/mx_basic.c, tests/mx_elapsed.c, utils/fake_mx_counters.c,
utils/sample_output: Re-write of the MX component
+ Add tests + Modernize code + Remove the need to run ./configure
in the mx directory + Add fake mx_counters program that lets you
test component on machine without myrinet installed
* src/components/: README, acpi/Rules.acpi,
acpi/linux-acpi-memory.c, acpi/linux-acpi.c, acpi/linux-acpi.h:
Remove the ACPI component.
It was one of the oldest components and needed a lot of cleanup
work, and it turns out that the main useful event it provided
(temperature) isn't available on modern machines/kernels
(coretemp should be used instead).
2012-01-23
* src/perf_events.c: Restored Phil's changes that I inadvertently
clobbered with my last commit :(
* src/perf_events.c: Remove a warning about an uninitialized
variable.
* src/utils/: component.c, event_info.c, native_avail.c: Update the
Doxygen comments on these utilities to have the command line
options listed in a list like the other utils.
* src/perf_events.c: More improvements to the read path for
multiplexed counters. Now the case for bad kernel behavior is
built in, and is not required with a #define.
Basically, there are situations when either enabled or running is
zero but not both. This could result in a divide by 0 in the
worst case, as was observed by Tushar Mohan in papiex. You could
trigger it by doing a read immediately after doing a start with
perf events and use a FORMAT_SCALE argument.
Now the logic goes, assuming mpxing.
1) if (running=enabled) return raw counter 2) if (running
&& enabled) scale counter by ratio 3) else warn in debug mode
return raw counter
Apparently we need a test case that does a read immediately after
a start. That's a hole.
Tested on brutus, core2 2.6.36
Here's the original report. ------------------- Model string and
code : Intel(R) Pentium(R) M processor 1600MHz (9) Linux
thinkpad 2.6.38-02063808-generic #201106040910 SMP Sat Jun 4
10:51:30 UTC 2011 i686 GNU/Linux PAPI Version: 4.2.0.0
I think I ran into a bug similar to what we ran with MIPS.
With the latest PAPI (from CVS), on an x86 (32-bit machine), when
using papiex with multiplex with anything more than two events, I
get a floating point exception in PAPI during the PAPI_read call.
On enabling debugging in the substrate, I think the problem is
the same (namely a division by zero, because some event had a
zero time of running):
libpapiex debug: 24625,0x0,papiex_thread_init_routine Starting
counters with PAPI_start
SUBSTRATE:perf_events.c:pe_enable_counters:953:24625
ioctl(enable): ctx: 0x96a4bc8, fd: 3
SUBSTRATE:perf_events.c:pe_enable_counters:953:24625
ioctl(enable): ctx: 0x96a4bc8, fd: 5 libpapiex debug:
24625,0x0,papiex_thread_init_routine Calling PAPI_lock before
critical section libpapiex debug:
24625,0x0,papiex_thread_init_routine Released PAPI lock libpapiex
debug: 24625,0x0,papiex_start START POINT 0 LABEL libpapiex
debug: 24625,0x0,papiex_start Reading counters (PAPI_read) to get
initial counts SUBSTRATE:perf_events.c:_papi_pe_read:1147:24625
read: fd: 3, tid: 0, cpu: -1, ret: 56
SUBSTRATE:perf_events.c:_papi_pe_read:1148:24625 read: 2 1341021
1341021 SUBSTRATE:perf_events.c:_papi_pe_read:1181:24625
(papi_pe_buffer[3] 33405 * tot_time_enabled 1341021) /
tot_time_running 1341021
SUBSTRATE:perf_events.c:_papi_pe_read:1181:24625
(papi_pe_buffer[5] 44552 * tot_time_enabled 1341021) /
tot_time_running 1341021
SUBSTRATE:perf_events.c:_papi_pe_read:1147:24625 read: fd: 5,
tid: 0, cpu: -1, ret: 40
SUBSTRATE:perf_events.c:_papi_pe_read:1148:24625 read: 1 214777 0
SUBSTRATE:perf_events.c:_papi_pe_read:1181:24625
(papi_pe_buffer[3] 0 * tot_time_enabled 214777) /
tot_time_running 0
The above debug log is for three events: PAPI_TOT_CYC,
PAPI_TOT_INS and PAPI_L1_DCM. Multiplexing works with two events.
Adding the third (any event), gives this error. Basically, the
floating point exception kills the program, and PAPI_read never
returns.
I think I know why papiex always hits this bug: It's because
right after starting the counters with PAPI_start, papiex does a
PAPI_read to store the initial values of the counters in a tmp
variable. These are then subtracted from the final counter
values. Should we put a deliberate delay? Of course, the real bug
should be fixed in PAPI. ----
* src/utils/event_info.c: Major re-write of the papi_xml_event_info
program. + Remove event code numbers, as they are not stable
run-to-run + Add some Doxygen comments + Remove some wrong
assumptions that could cause potential buffer overflows + Improve
usage information
2012-01-20
* src/components/lustre/: Rules.lustre, linux-lustre.c,
linux-lustre.h,
fake_proc/fs/lustre/llite/hpcdata-ffff81022a732800/read_ahead_stats,
fake_proc/fs/lustre/llite/hpcdata-ffff81022a732800/stats,
tests/Makefile, tests/lustre_basic.c: Finish the re-write of the
lustre component.
It would be nice if someone with access to a machine with a
lustre filesystem could test this for us.
* src/: papi_internal.c, components/lustre/linux-lustre.c: Update
the component initialization code so that it can handle a PAPI
ERROR return gracefully. Previously there was no way to indicate
initialization failure besides just setting num_native_events to
0.
2012-01-19
* src/components/lustre/: linux-lustre.c, linux-lustre.h: First
pass at cleaning up the lustre component.
It should now properly report no events when no lustre
filesystems are available.
2012-01-11
* src/papi_events.csv: Add AMD fam12h support to the events file.
Right now it is just an alias to the similar fam10h event list;
this can be split out if necessary once we find a tester with the
hardware.
* src/libpfm4/: README, docs/man3/pfm_get_event_next.3,
docs/man3/pfm_get_pmu_info.3, include/perfmon/perf_event.h,
include/perfmon/pfmlib.h, lib/Makefile, lib/pfmlib_amd64.c,
lib/pfmlib_amd64_priv.h, lib/pfmlib_common.c,
lib/pfmlib_perf_event.c, lib/pfmlib_priv.h,
lib/events/intel_coreduo_events.h, lib/events/perf_events.h,
perf_examples/Makefile, perf_examples/perf_util.c,
perf_examples/perf_util.h, perf_examples/self.c,
perf_examples/task_smpl.c, perf_examples/x86/bts_smpl.c: Fix
"merge" conflicts with libpfm4 merge.
* src/libpfm4/lib/: pfmlib_amd64_fam12h.c,
events/amd64_events_fam12h.h: Initial revision
* src/papi_libpfm4_events.c: Properly use the pfm_get_event_next()
iterator to find next event.
Without this, on AMD Fam10h some events are missed.
Some events are still missed due to libpfm4 bug, this will be
fixed once I update the libpfm4 tree included with PAPI.
Note, enumeration fixes like this often break things, so please
test if possible.
* src/papi_events.csv: Update the coreduo (not core2) events. Most
notably the FP events were wrong.
This, along with a forthcoming libpfm4 update, make all the
CTESTS pass on an old Yonah coreduo laptop I have.
2012-01-05
* src/ctests/api.c: Make the api test actually test PAPI_flops() as
it claims to do, rather than PAPI_flips().
Patch thanks to: Emilio De Camargo Francesquini
* src/papi_hl.c: Fix some copy-and-paste documentation remnants in
the papi_hl.c file, mostly where it said FLIPS where it meant
FLOPS.
2012-01-04
* src/utils/native_avail.c: Update papi_native_avail to *not* print
the event codes, as these are not guaranteed to be stable from
run to run.
Also fix up the formatting and print some component info too.
Please try and let me know if you don't like the new output.
* src/: configure, configure.in: Respect a FORCED option in
configure.
2011-12-22
* src/Rules.pfm4_pe: Remove perfmon.h from MISCHDRS.
2011-12-20
* src/: Rules.perfctr, Rules.perfctr-pfm, Rules.pfm, Rules.pfm4_pe,
Rules.pfm_pe, linux-lock.h, mb.h: Merry Christmas ARM users.
This patch fixes the SMP ARM issues reported by Harald Servat.
Also, adds proper header dependency checking in the Rules files.
People, please when you add headers, please add them to the
dependency lines so everything gets rebuilt properly.
New implementation of SMP locks are very pedantic, that is, they
are nost the fastest, but they do use atomics and avoid kernel
intervention.
Passed on our 2 core ARM v7. All pthreads tests now pass, except
the ones that also fail in the single processor case usually due
to a missing event.
Samples:
mucci@panda:~/papi.head/src$ uname -a Linux panda 3.0.0 #2 SMP
Fri Jul 29 16:23:54 EDT 2011 armv7l GNU/Linux
mucci@panda:~/papi.head/src$ hostname panda
mucci@panda:~/papi.head/src$ cat /proc/cpuinfo Processor: ARMv7
Processor rev 2 (v7l) processor: 0 BogoMIPS: 2007.19
processor: 1 BogoMIPS: 1965.18
Features: swp half thumb fastmult vfp edsp thumbee neon vfpv3 CPU
implementer: 0x41 CPU architecture: 7 CPU variant: 0x1 CPU part:
0xc09 CPU revision: 2
Hardware: OMAP4 Panda board Revision: 0020 Serial:
0000000000000000
mucci@panda:~/papi.head/src$ ./ctests/locks_pthreads Creating 2
threads 10000 iterations took 13489 us. Running 44480 iterations
Expected: 88960 Received: 88960 locks_pthreads.c
PASSED
mucci@panda:~/papi.head/src$ ./ctests/pthrtough Creating 2
threads for 1000 iterations each of: register create_eventset
destroy_eventset unregister pthrtough.c
PASSED
mucci@panda:~/papi.head/src$ ./ctests/pthrtough2 Creating 2000
threads for 1 iterations each of: register create_eventset
destroy_eventset unregister Failed to create thread: 238
Continuing test with 237 threads. pthrtough2.c
PASSED
mucci@panda:~/papi.head/src$ ./ctests/thrspecific Thread
0x40ae1470 started, specific data is at 0xbea9c6d4 Thread
0x40021000 started, specific data is at 0xbea9c6c4 Thread
0x4244d470 started, specific data is at 0xbea9c6c8 Thread
0x4138d470 started, specific data is at 0xbea9c6d0 Thread
0x41c4d470 started, specific data is at 0xbea9c6cc Entry 0,
Thread 0x41c4d470, Data Pointer 0xbea9c6cc, Value 4000000 Entry
1, Thread 0x40021000, Data Pointer 0xbea9c6c4, Value 500000 Entry
2, Thread 0x40ae1470, Data Pointer 0xbea9c6d4, Value 1000000
Entry 3, Thread 0x4244d470, Data Pointer 0xbea9c6c8, Value
8000000 Entry 4, Thread 0x4138d470, Data Pointer 0xbea9c6d0,
Value 2000000 thrspecific.c PASSED
mucci@panda:~/papi.head/src$ ./ctests/krentel_pthreads
program_time = 6, threshold = 20000000, num_threads = 3
launched timer in thread 0 launched timer in thread 1 launched
timer in thread 3 launched timer in thread 2 [1] time = 1, count
= 7, iter = 5, rate = 1400.0/Kiter [2] time = 1, count = 7, iter
= 5, rate = 1400.0/Kiter [0] time = 1, count = 7, iter = 5, rate
= 1400.0/Kiter [3] time = 1, count = 7, iter = 5, rate =
1400.0/Kiter [1] time = 2, count = 25, iter = 16, rate =
1562.5/Kiter [0] time = 2, count = 25, iter = 16, rate =
1562.5/Kiter [3] time = 2, count = 25, iter = 16, rate =
1562.5/Kiter [2] time = 2, count = 25, iter = 16, rate =
1562.5/Kiter [1] time = 3, count = 25, iter = 16, rate =
1562.5/Kiter [2] time = 3, count = 25, iter = 16, rate =
1562.5/Kiter [0] time = 3, count = 25, iter = 16, rate =
1562.5/Kiter [3] time = 3, count = 25, iter = 16, rate =
1562.5/Kiter [1] time = 4, count = 25, iter = 16, rate =
1562.5/Kiter [0] time = 4, count = 25, iter = 16, rate =
1562.5/Kiter [3] time = 4, count = 25, iter = 16, rate =
1562.5/Kiter [2] time = 4, count = 25, iter = 16, rate =
1562.5/Kiter [3] time = 5, count = 25, iter = 16, rate =
1562.5/Kiter [0] time = 5, count = 25, iter = 16, rate =
1562.5/Kiter [2] time = 5, count = 25, iter = 16, rate =
1562.5/Kiter [1] time = 5, count = 26, iter = 17, rate =
1529.4/Kiter [2] time = 6, count = 25, iter = 16, rate =
1562.5/Kiter [0] time = 6, count = 27, iter = 17, rate =
1588.2/Kiter done krentel_pthreads.c PASSED
2011-12-15
* src/papi_libpfm_presets.c: Change PAPI_PERFMON_EVENT_FILE
environment variable name to PAPI_CSV_EVENT_FILE since it's not
just for perfmon anymore.
* src/: configure, configure.in: Open mouth, insert foot; fix
perfctr configure by not testing a library we have not built yet.
2011-12-14
* src/: configure, configure.in: Missed one more place where we
tested perfctr != "no"
* src/: configure, configure.in: Fix a typo in the perfctr section;
it was causing a machine to default to perfctr when it had no
performance interface. ( a centos vm image with a 2.6.18 kernel
)
Also checks that we actually have perfctr if we specify
--with-perfctr.
2011-12-08
* src/components/cuda/: Makefile.cuda.in, Rules.cuda, configure,
configure.in, linux-cuda.c, linux-cuda.h: Added auto-detection of
CUDA version to PAPI CUDA Component. Reason is, the interface has
changed between CUDA/CUPTI 4.0 and 4.1. PAPI now supports both
CUDA versions without any exposure to the users. Configure step
is unchanged and no additional knowledge of which CUDA version is
installed is required.
2011-12-03
* src/components/appio/: CHANGES, README, Rules.appio, appio.c,
appio.h, tests/Makefile, tests/appio_list_events.c,
tests/appio_values_by_code.c, tests/appio_values_by_name.c: [no
log message]
2011-11-25
* src/linux-timer.c: Fix compilation warning if you specify
--with-walltime=gettimeofday
* src/linux-timer.c: Fix the build on Linux systems using mmtimer
* src/linux-common.c: Update the linux MHz detection code to use
bogoMIPS when there is no MHz field available in /proc/cpuinfo.
This gives roughly correct MHz on ARM, and the MIPS workaround
should also still work.
2011-11-23
* src/components/net/linux-net.c: Fix compile errors in a debug
message. (pathname didn't exist but we are working on
NET_PROC_FILE)
2011-11-22
* src/components/net/: linux-net.c, tests/net_values_by_code.c,
tests/net_values_by_name.c: Change the ping command in the net
tests to not use &> to redirect to NULL.
This would work on a system with csh, but on systems with a bash
shell this runs ping in the background instead, so the test
finishes before ping can generate any packets.
* src/components/net/linux-net.c: Fix slight bug in the net
component, where a memset() had the wrong arguments. This made
for weird results in the case where we start/stop quickly enough
that we return the initial data.
* src/components/net/: CHANGES, Makefile.net.in, README, Rules.net,
configure, configure.in, linux-net.c, linux-net.h,
tests/Makefile, tests/net_list_events.c,
tests/net_values_by_code.c, tests/net_values_by_name.c: Replace
net component with updated version written by Jose Pedro
Oliveira
* Dynamically detects the network interfaces
(i.e. the ones listed in /proc/net/dev)
* No longer needs to fork/exec the external ifconfig command and
parse its output. It now reads the Linux kernel network
statistics directly from /proc/net/dev.
* Each network interface now has 16 events instead of 13
(all counters in /proc/net/dev).
* Adds support for PAPI_event_name_to_code()
* Adds a couple of small tests/examples
2011-11-16
* doc/Doxyfile-everything: Fix the exclude libpfm/perfctr config.
2011-11-10
* src/perf_events.c: Only scale when running != enabled.
Now verified on ig, brutus and the malta
* src/perf_events.c: Further tuneups for mpx'ing.
Previous commit broke systems with valid return values from
perf_events for running & enabled. My attempt at scaling in long
long world caused an overflow which led to a negative number when
passed up the chain.
Also consolidated types... best way to avoid this stuff is to
start as the type you are ending as.
Now we use some better integer scaling...guaranteed within +-0.5%
of the actual scaled value of enabled / running.
New results on brutus: multiplex1
case1: Does PAPI_multiplex_init() not break regular operation?
Added PAPI_TOT_CYC Added PAPI_FP_INS case1: PAPI_TOT_CYC
PAPI_FP_INS case1: 2739865106 600002876
case2: Does setmpx/add work? Added PAPI_TOT_CYC Added PAPI_FP_INS
case2: PAPI_TOT_CYC PAPI_FP_INS case2: 2739678237
600002258
case3: Does add/setmpx work? Added PAPI_TOT_CYC Added PAPI_FP_INS
case3: PAPI_TOT_CYC PAPI_FP_INS case3: 2739847832
600002298
case4: Does add/setmpx/add work? Added PAPI_TOT_CYC Added
PAPI_FP_INS case4: PAPI_TOT_CYC PAPI_FP_INS case4:
2737832980 600013404
case5: Does setmpx/add/add/start/read work? Added PAPI_TOT_CYC
Added PAPI_FP_INS read @start counter[0]: 7106 read @stop
counter[0]: 2740387017 difference counter[0]: 2740379911 read
@start counter[1]: 0 read @stop counter[1]: 600017169 difference
counter[1]: 600017169 multiplex1.c
PASSED
2011-11-09
* src/components/cuda/linux-cuda.c: For the CUDA Component,
PAPI_read() now accumulates event values. This has to be
explicitly done in PAPI because CUPTI automatically resets all
counter values to 0 after a read. (PAPI_start()/stop() continues
to reset the values to 0)
* src/perf_events.c: Last of the multiplex fixes to perf events.
The root of all evil was this:
counts[i] = ( uint64_t )
( ( double ) buffer[count_idx] * ( double )
buffer[get_total_time_enabled_idx( )] /
( double )
buffer[get_total_time_running_idx( )] ) ; In addition to
improper casting to uints... (papi returns int64s), using
floating point arith is a no-no. Plus this resulted in divide by
zeros...
Before:
SUBSTRATE:perf_events.c:_papi_pe_read:1155:12218 read: fd: 3,
tid: 0, cpu: -1, buffer[0-2]: 0x6cba, 0x0, 0x0, ret: 24
SUBSTRATE:perf_events.c:_papi_pe_read:1155:12218 read: fd: 4,
tid: 0, cpu: -1, buffer[0-2]: 0x23, 0x0, 0x0, ret: 24
SUBSTRATE:perf_events.c:_papi_pe_read:1155:12218 read: fd: 3,
tid: 0, cpu: -1, buffer[0-2]: 0x6de72b5d, 0x8ae0fa80, 0x8ae0fa80,
ret: 24 SUBSTRATE:perf_events.c:_papi_pe_read:1155:12218 read:
fd: 4, tid: 0, cpu: -1, buffer[0-2]: 0x4c4b46b, 0x8ae0fa80,
0x8ae0fa80, ret: 24
So kernel is good, but errors in multiplexed scaling.
case5: Does setmpx/add/add/start/read work? Added PAPI_TOT_CYC
Added PAPI_FP_INS read @start counter[0]: 9223372034707292159
read @stop counter[0]: 1843791732 difference counter[0]:
-9223372032863500427 multiplex1.c
FAILED Line # 389
With fix:
SUBSTRATE:perf_events.c:_papi_pe_read:1151:12821 read: fd: 3,
tid: 0, cpu: -1, buffer[0-2]: 0x6782, 0x0, 0x0, ret: 24
SUBSTRATE:perf_events.c:_papi_pe_read:1151:12821 read: fd: 4,
tid: 0, cpu: -1, buffer[0-2]: 0x0, 0x0, 0x0, ret: 24
SUBSTRATE:perf_events.c:_papi_pe_read:1151:12821 read: fd: 3,
tid: 0, cpu: -1, buffer[0-2]: 0x6de725dc, 0x8ae0fa80, 0x8ae0fa80,
ret: 24 SUBSTRATE:perf_events.c:_papi_pe_read:1151:12821 read:
fd: 4, tid: 0, cpu: -1, buffer[0-2]: 0x4c4b400, 0x8ae0fa80,
0x8ae0fa80, ret: 24 read @start counter[0]: 26498 read @stop
counter[0]: 1843865052 difference counter[0]: 1843838554 read
@start counter[1]: 0 read @stop counter[1]: 80000000 difference
counter[1]: 80000000
SUBSTRATE:perf_events.c:_papi_pe_update_control_state:1288:12821
Called with count == 0
SUBSTRATE:papi_libpfm4_events.c:_papi_libpfm_shutdown:1178:12821
shutdown multiplex1.c PASSED
New code is vastly simpler and smaller and checks for bad kernel
behavior:
int64_t tot_time_running =
papi_pe_buffer[get_total_time_running_idx( )];
int64_t tot_time_enabled =
papi_pe_buffer[get_total_time_enabled_idx( )];
#ifdef BRAINDEAD_MULTIPLEXING if (tot_time_enabled == 0)
tot_time_enabled = 1; if (tot_time_running == 0)
tot_time_running = 1; #else /* If we are convinced this
platform's kernel is fully operational, then this stuff will
never happen. If it does, then BRAINDEAD_MULTIPLEXING
needs to be enabled. */ if ((tot_time_running == 0) &&
(papi_pe_buffer[count_idx])) { PAPIERROR("This platform
has a kernel bug in multiplexing, count is %lld (not 0), but time
running is 0.\n",papi_pe_buffer[count_idx]); return
PAPI_EBUG; } if ((tot_time_enabled == 0) &&
(papi_pe_buffer[count_idx])) { PAPIERROR("This platform
has a kernel bug in multiplexing, count is %lld (not 0), but time
enabled is 0.\n",papi_pe_buffer[count_idx]); return PAPI_EBUG;
} #endif pe_ctl->counts[i] =
(papi_pe_buffer[count_idx] * tot_time_enabled) /
tot_time_running;
Also, renamed all instances of 'buffer' to papi_pe_buffer because
buffer is a global variable on MIPS/Linux/libc. Yikes! (gdb)
whatis buffer type = struct utmp *
* src/ctests/multiplex1.c: Made sure that PAPI_TOT_CYC is the first
event added to multiplexing event set.
This will demonstrate the bug in perf_event multiplexing
arithmetic in case5 on MIPS and other perf_event subsystems that
likely have some breakage in the kernels handling of
multiplexing. The common bug is that the perf_event subsystem
does not fill in the second and third elements of the 24 byte
read that gets returned from the kernel. These values are
time_enabled and time_running. MIPS as of 3.0.3 just fills this
in after a HZ tick has happened. Workarounds are pretty simple in
the low level layer...
A buggy output looks like this (3.0.3 MIPS/Linux Big Endian)
-bash-4.1$ ./ctests/multiplex1 case1: Does PAPI_multiplex_init()
not break regular operation? Added PAPI_TOT_CYC Added PAPI_FP_INS
case1: PAPI_TOT_CYC PAPI_FP_INS case1: 1843775252
80000000
case2: Does setmpx/add work? Added PAPI_TOT_CYC Added PAPI_FP_INS
case2: PAPI_TOT_CYC PAPI_FP_INS case2: 1843773254
80000037
case3: Does add/setmpx work? Added PAPI_TOT_CYC Added PAPI_FP_INS
case3: PAPI_TOT_CYC PAPI_FP_INS case3: 1843772919
80000037
case4: Does add/setmpx/add work? Added PAPI_TOT_CYC Added
PAPI_FP_INS case4: PAPI_TOT_CYC PAPI_FP_INS case4:
1843773959 80000037
case5: Does setmpx/add/add/start/read work? Added PAPI_TOT_CYC
Added PAPI_FP_INS read @start counter[0]: 9223372034707292159
read @stop counter[0]: 1843784577 difference counter[0]:
-9223372032863507582 multiplex1.c
FAILED Line # 389 Error: Difference in start and stop resulted in
negative value!
2011-11-08
* src/components/cuda/: linux-cuda.c, linux-cuda.h: Updated CUDA
component for CUPTI 4.1 (RC1). Note, SetCudaDevice() should now
work with the latest CUDA 4.1 version.
2011-11-07
* src/components/coretemp/linux-coretemp.c: Update coretemp to
better handle sparse numbering of the inputs.
* doc/Doxyfile-everything: Exclude the libpfm* and perfctr-*
directories from consideration when generating Doxygen docs.
* src/: papi.h, components/acpi/linux-acpi.h,
components/coretemp_freebsd/coretemp_freebsd.c,
components/cuda/linux-cuda.h,
components/infiniband/linux-infiniband.h,
components/mx/linux-mx.h, components/net/linux-net.h: Place a
space in < your name here > to cleanup doxygen warnings.
* src/perf_events.c: Only perf event systems that have FAST counter
reads and FAST hw timer access are x86...
* src/linux-common.c: MIPS clock and Linux fixup code
* src/components/example/example.c: A little more documentation on
which of the component vector function pointers are relevant.
* src/papi_vector.c: Tested the dummy get_{real,virt}_{cyc,usec}
functions on zeus, they appear to work.
* src/components/example/tests/example_multiple_components.c:
Another fix to properly skip the multiple component case if CPU
component not available.
* src/components/example/tests/example_multiple_components.c: Skip
the test if no CPU component enabled, rather than fail.
2011-11-04
* src/components/example/example.c: Free example_native_table with
papi_free, glibc didn't like it if we just called free. (we
allocate it with papi_calloc)
* man/...: Version number bump. (since the pages are
quantifiably different from those released in 4.2.0 )
* doc/: Doxyfile, Doxyfile-everything, Doxyfile.utils: Bump version
number in the doxygen config files.
* src/components/example/example.c:
_papi_example_shutdown_substrate does not have any arguments.
* src/components/net/linux-net.c: Include ctype.h for isspace().
* release_procedure.txt: release_procedure now reflects the correct
version of doxygen to use.
* src/buildbot_configure_with_components.sh: Do not always
configure with not cpu counters, allow this to be passed in.
Allows us to use one script for both types of builds we test.
* delete_before_release.sh,
src/buildbot_configure_with_components.sh: Create a script for
buildbot to configure with several components.
Buildbot runs all commandline arguments through a sanitization
before passing them to sh. Thus --with-configure="a b c" =>
'--with-configure="a b c"' which is bad.
delete_before_release.sh has been instructed to remove this file.
* man/...: Rebuild the manpages with doxygen 1.7.4 to
remove the 's at the end of sentances.
The html output looks clean.
2011-11-03
* src/: multiplex.c, papi.c: Fix some gcc-4.6 compile warnings
complaining that retval was being set but not used.
* src/papi.c: Add some extra comments to the PAPI_num_cmp_hwctrs()
code that describe its limitations a bit better.
2011-11-02
* src/: ctests/overflow_allcounters.c, testlib/test_utils.c: Add
lots of debugging to make results of overflow_allcounters test a
bit more clear.
* src/components/coretemp/tests/coretemp_pretty.c: coretemp_pretty
wasn't printing the description for fan inputs.
The result on an apple MacBook Pro (running Linux) now looks like
this:
Trying all coretemp events Found coretemp component at cid 2
hwmon0.temp1_input value: 33.50 degrees C, applesmc
module, label TB0T hwmon0.temp2_input value: 33.50 degrees C,
applesmc module, label TB1T hwmon0.temp3_input value: 32.00
degrees C, applesmc module, label TB2T hwmon0.temp4_input value:
0.00 degrees C, applesmc module, label TB3T hwmon0.temp5_input
value: 62.25 degrees C, applesmc module, label TC0D
hwmon0.temp6_input value: 54.25 degrees C, applesmc module,
label TC0F hwmon0.temp7_input value: 57.25 degrees C, applesmc
module, label TC0P hwmon0.temp8_input value: 69.00 degrees C,
applesmc module, label TG0D hwmon0.temp9_input value: 58.00
degrees C, applesmc module, label TG0F hwmon0.temp10_input
value: 51.25 degrees C, applesmc module, label TG0H
hwmon0.temp11_input value: 58.25 degrees C, applesmc
module, label TG0P hwmon0.temp12_input value: 60.75
degrees C, applesmc module, label TG0T hwmon0.temp13_input
value: 62.25 degrees C, applesmc module, label TN0D
hwmon0.temp14_input value: 59.25 degrees C, applesmc
module, label TN0P hwmon0.temp15_input value: 49.00
degrees C, applesmc module, label TTF0 hwmon0.temp16_input
value: 54.00 degrees C, applesmc module, label Th2H
hwmon0.temp17_input value: 58.75 degrees C, applesmc
module, label Tm0P hwmon0.temp18_input value: 31.50
degrees C, applesmc module, label Ts0P hwmon0.temp19_input
value: 44.25 degrees C, applesmc module, label Ts0S
hwmon0.fan1_input value: 1999 RPM, applesmc module, label Left
side hwmon0.fan2_input value: 2003 RPM, applesmc module,
label Right side coretemp_pretty.c PASSED
* src/components/coretemp/: linux-coretemp.c, linux-coretemp.h,
tests/coretemp_pretty.c: Make the coretemp code a bit pickier
about which events it supports. Add descriptions to the events.
Also add support for Voltage (in*) events.
On an amd14h machine I have access to, coretemp_pretty now
prints:
Trying all coretemp events Found coretemp component at cid 2
hwmon0.in1_input value: 1.31 V, it8721 module, label ?
hwmon0.in2_input value: 2.22 V, it8721 module, label ?
hwmon0.in3_input value: 3.34 V, it8721 module, label +3.3V
hwmon0.in4_input value: 1.02 V, it8721 module, label ?
hwmon0.in5_input value: 1.52 V, it8721 module, label ?
hwmon0.in6_input value: 1.13 V, it8721 module, label ?
hwmon0.in7_input value: 3.26 V, it8721 module, label 3VSB
hwmon0.in8_input value: 3.17 V, it8721 module, label Vbat
hwmon0.temp1_input value: 28.00 degrees C, it8721 module, label ?
hwmon0.temp2_input value: -128.00 degrees C, it8721 module, label
? hwmon0.temp3_input value: -128.00 degrees C, it8721 module,
label ? hwmon0.fan1_input value: 0 RPM hwmon0.fan2_input value:
1320 RPM hwmon1.temp1_input value: 33.00 degrees C, jc42 module,
label ? hwmon2.temp1_input value: 31.75 degrees C, jc42 module,
label ? hwmon3.temp1_input value: 53.00 degrees C, radeon module,
label ? hwmon4.temp1_input value: 53.12 degrees C, k10temp
module, label ? coretemp_pretty.c PASSED
* src/components/coretemp/: linux-coretemp.c,
tests/coretemp_pretty.c: Cut and paste error slipped in to that
last commit. Fixes a build issue.
* src/components/coretemp/: linux-coretemp.c, tests/Makefile,
tests/coretemp_pretty.c: Clean up coretemp with same cleanups
done in example component.
Add a new test, "coretemp_pretty" that prints coretemp results in
a more user-friendly way.
* man/:... Rebuild the man pages with a newer version of
doxygen. ( older versions of doxygen had a nasty bug in man
output. )
Also reworked the utilities documentation to remove pages for the
files. Thanks to Jose Pedre Oliveria for pointing this out.
* src/components/example/tests/: Makefile,
example_multiple_components.c: Add a test that makes sure you can
have active EventSets on multiple components at the same time.
* release_procedure.txt: Change PATH specification to include tcsh
syntax; other minor syntax corrections.
* src/components/example/example.c: More cleanups and documentation
for the example component.
2011-11-01
* src/components/example/example.c: Some more major overhaul of the
example component. A lot more documentation, plus make is behave
a lot more like a real component would.
* doc/Doxyfile.utils: Turn off undocumented warnings for the utils.
doxygen run.
* src/utils/: avail.c, command_line.c, cost.c, event_chooser.c,
multiplex_cost.c: Add spaces to the comments so doxygen doesn't