Reduce analysis cost after detected race #16

jprotze · 2017-02-14T13:03:25Z

Identifying duplicates in data race reports is quite expensive regarding time.
It should be possible to reduce this cost by applying only logging, but no analysis for memory accesses, that were identified as racy code.

The idea would be:

implement a variant of the Tsan-rtl entrypoints as tsan_read(), that only logs, but skips race analysis and call it something like tsan_read_log(). Important is a symetric function signature.
when a data race is detected, binary patch the originating tsan_read call to call tsan_read_log instead

This will reduce analysis cost to detect duplicates and reduce output about races between other threads.

A potential issue of this approach is, that TSan potentially will not report races that this memory access has with other code regions. That means, more iterations of fixing code and running TSan might be needed.

dongahn · 2017-02-14T15:50:01Z

Good thoughts! Yes this would be a faster alternative to the current race reduction angle. Two things concern me a bit, none of which is fundamental to the proposal but important regardless for archer at this stage.

the concept of duplicates really requires additiinal research and I suspect we need to create a user feedback loopor more use cases for this in the end. This is why I wanted to make it multilvel capapbility and this can be more easily done after seeing all races.
I think having no or very minor modifications to the Tsan runtime helps at this point to be able to support Archer on various envirronments. It seems the proposal demands a large patch to Tsan itself? unless i an mostaken.

Maybe we can learn about the well-knwon reduction criteria using the current work and then use that lessons to do high speed reduction like you proposed here (once that problem is understood)

Dong

SumedhArani · 2017-02-15T00:34:18Z

@jprotze , trying to understand your proposal here.

Is it that you're suggesting that we analyse the log rather than tsan analysing memory accesses for portions of code where it's earmarked as racy region?

I'd once worked on a project(LLVM pass) which generated a trace report of a multi threaded code. One of the issues that I faced at that time were that, because of race conditions, the trace report was distorted only to be handled afterwards. But isn't doing all this and then analysing this a significant cost?

I understand that it'll be faster but how significant will the gains be?
(I'm not familiar with runtime overheads associated with tsan. Just asking for my knowledge and understanding.)

Hope I've understood the proposal correctly and I'm asking the right questions.

Thank you,
Sumedh.

dongahn · 2017-02-15T00:40:42Z

@SumedhArani: I think the main distinction is on-the-fly duplicate detection vs. post mortem detection. As I mentioned above, I think this optimization can be done later once we understand what constitutes "duplicates" using your technique.

SumedhArani · 2017-02-15T01:01:28Z

Ohh! When you mentioned current work, I thought you ( @dongahn ) were referring to the work being done in the research field regarding reduction.

dongahn · 2017-02-15T01:22:30Z

No, that's your topic now :-)

simoatze · 2017-03-12T03:42:52Z

@dongahn @jprotze I think I solved the problem of overriding the weak symbol of __tsan_on_report and actually do the reduction without modifying the compiler-rt. I created a new branch called race_aggregation and I am making @SumedhArani as external collaborator so he can work directly on archer repo.
I'll try to give an explanation here:

@SumedhArani created a new static library redefining __tsan_on_report but when we tried to compile a program and link it against this new library tsan kept calling the weak __tsan_on_report.
So googling I found this article that says that we need to link in the following way:

-Wl,--whole-archive libarcher_static_report.a -Wl,--no-whole-archive

In a nutshell, --whole-archive forces the linker to scan through all its symbols (including those already resolved). If there is a strong symbol that was already resolved by a weak symbol (as in our case), the strong symbol will overrule the weak one.

I already modified the clang-archer wrapper to add this flag automatically.

Compiling in this way tsan calls our __tsan_on_report. To suppress original Tsan report we need to run like this:

TSAN_OPTIONS="log_path=" ./program

So, what @SumedhArani did was correct, we just needed the right flag for compiling, now that we found the solution he can actually integrate the race reduction on archer! :)

Also looking at the compiler-rt I saw that in the following module:

https://github.com/llvm-mirror/compiler-rt/blob/e257e18436359d8a0636752e25e6db0f898b468a/lib/tsan/rtl/tsan_debugging.cc

there are utility functions such as:

__tsan_get_report_data
__tsan_get_report_thread
etc.

Maybe we can use those to extract the information we need for the reduction from the pointer void *rep that we get from __tsan_on_report, and we don't need to include the header files of tsan in our project. @SumedhArani you should take a look and see if we can actually exploit those functions.
There is an example in one of the tsan tests here.

@SumedhArani as I said previously I created a new branch called race_aggregation because I wanted to use the newest archer version based on the last OpenMP runtime with OMPT support.
Please do the following:

Delete your ARCHER fork
Start working on https://github.com/PRUNERS/archer/tree/race_aggregation
Update your OpenMP runtime to https://github.com/OpenMPToolsInterface/LLVM-openmp - Branch: towards_tr4 (If I am not mistaken, you are currently using the branch 'align-to-tr').

Let me know if all this makes sense, we can talk on Slack if further clarification is needed.
I'll make you collaborator on archer in a sec.

dongahn · 2017-03-12T14:07:55Z

@simoatze: out of curiosity, was __tsan_on_report already a weak symbol? Or does this need to bring in our own locally modified TSan runtime library?

It seems this is are various hooks into "on events"

If this is these hooks, it seems we can indeed have good mileage!

I think the reduction features can be incorporated as an Archer option.

ARCHER_OPTIONS="group_by=pc"

This can be read by Archer init function which can also set and modify relevant TSAN options such as log_path. It seems it is okay to generate both TSan report and Archer summary report together though. Maybe, we first report

==== ARCHER SUMMARY
Blah

==== Original TSAN Report

In any case group_by can later be extended to cover other aggregation criteria.

simoatze · 2017-03-12T15:45:38Z

@dongahn: __tsan_on_report is already weak on tsan, but seems that weak symbols don't work in the same way with static library, that's why we need this workaround.

dongahn · 2017-03-12T15:46:47Z

Your approach seems great to me! Thank you for looking into this. I know you are super busy.

jprotze · 2017-03-12T16:18:27Z

Afaik, weak symbol just avoids that you get linker error for duplicate implementation when linking static libraries.
The linker uses the first implementation that is detected. So, to overwrite the symbol you would need to link your library before the tsan library.
The tsan runtime is linked with the -fsanitize=thread flag.

jprotze · 2017-03-12T16:20:42Z

But linking libarcher before tsan would break our fallback implementation of tsan annotations in libarcher.

simoatze · 2017-03-12T17:19:09Z

That's what I know too and I tried to change the order, it didn't work.

jprotze · 2017-03-13T12:00:07Z

I'm not sure, but I think, the posts of last 24h were on #11 ?

@SumedhArani My intention in this issue was to binary patch callq __tsan_read4 to "nop"s at runtime, after TSan reported a data race in this specific call to __tsan_read4.

  49ce5f:       48 89 c7                mov    %rax,%rdi
  49ce62:       e8 d9 44 fd ff          callq  471340 <__tsan_write4>
  49ce67:       48 8d 45 b0             lea    -0x50(%rbp),%rax

would become

  49ce5f:       48 89 c7                mov    %rax,%rdi
  49ce62:       00 00 00 00 00          
  49ce67:       48 8d 45 b0             lea    -0x50(%rbp),%rax

SumedhArani · 2017-03-13T15:03:36Z

I know the discussions on this thread have digressed and should have rather been on #13

Your intent for this issue was to not do logging rather than that of on the fly reduction. Right?

If I'm getting you correct, you've demonstrated above that instead of sending it to tsan reporting routine, we should rather have a "nop" at runtime and rather log it. Correct @jprotze ?

Thanks!

simoatze · 2017-03-13T15:05:42Z

@jprotze I see, how would you do that? With dyninst?

dongahn · 2017-03-13T16:22:43Z

Well... I don't understand. If there is a hook for "on report", why do we need to do a binary patch? What is the current library linking order and what order is not working and for what case again?

jprotze added the enhancement label May 18, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Reduce analysis cost after detected race #16

Reduce analysis cost after detected race #16

jprotze commented Feb 14, 2017

dongahn commented Feb 14, 2017

SumedhArani commented Feb 15, 2017

dongahn commented Feb 15, 2017

SumedhArani commented Feb 15, 2017

dongahn commented Feb 15, 2017

simoatze commented Mar 12, 2017 •

edited

Loading

dongahn commented Mar 12, 2017

simoatze commented Mar 12, 2017

dongahn commented Mar 12, 2017

jprotze commented Mar 12, 2017

jprotze commented Mar 12, 2017

simoatze commented Mar 12, 2017

jprotze commented Mar 13, 2017

SumedhArani commented Mar 13, 2017

simoatze commented Mar 13, 2017

dongahn commented Mar 13, 2017 •

edited

Loading

Reduce analysis cost after detected race #16

Reduce analysis cost after detected race #16

Comments

jprotze commented Feb 14, 2017

dongahn commented Feb 14, 2017

SumedhArani commented Feb 15, 2017

dongahn commented Feb 15, 2017

SumedhArani commented Feb 15, 2017

dongahn commented Feb 15, 2017

simoatze commented Mar 12, 2017 • edited Loading

dongahn commented Mar 12, 2017

simoatze commented Mar 12, 2017

dongahn commented Mar 12, 2017

jprotze commented Mar 12, 2017

jprotze commented Mar 12, 2017

simoatze commented Mar 12, 2017

jprotze commented Mar 13, 2017

SumedhArani commented Mar 13, 2017

simoatze commented Mar 13, 2017

dongahn commented Mar 13, 2017 • edited Loading

simoatze commented Mar 12, 2017 •

edited

Loading

dongahn commented Mar 13, 2017 •

edited

Loading