Skip to content

Latest commit

 

History

History
1339 lines (1058 loc) · 39.8 KB

README.md

File metadata and controls

1339 lines (1058 loc) · 39.8 KB

Understanding and Evaluating Dynamic Race Detection with Go

Andrew Chin

A Deep Dive Into the Go Compiler Codebase

Prerequisites

In order to follow along, the reader should have the following source code:

  1. LLVM source code (we use 9.0.0) but any recent version should be ok (https://releases.llvm.org/download.html)

  2. Compiler-RT source code (a subproject in the LLVM family, also 9.0.0). This download should be placed into llvm/projects/, where llvm is the source code downloaded in step 1.

  3. Go source code (cloned from master on Jan 3, 2020, but any recent version should be ok) (https://github.com/golang/go)

In the rest of this document, we will refer to relative file paths within llvm and go. These refer to the directories you have just downloaded.


Race Conditions vs. Data Races

According to https://dl.acm.org/doi/10.1145/130616.130623:

  • "Two fundamentally different types of races, that capture different kinds of bugs in different classes of parallel programs, can occur.
    • General races cause nondeterministic execution and are failures in programs intended to be deterministic.
    • Data races cause nonatomic execution of critical sections and are failures in (nondeterministic) programs that access and update shared data in critical sections."

According to the C++11 Standard, chapter intro.multithread, paragraph 21 (https://github.com/google/sanitizers/wiki/ThreadSanitizerAboutRaces#Volatile):

  • "The execution of a program contains a data race if it contains two conflicting actions in different threads, at least one of which is not atomic, and neither happens before the other. Any such data race results in undefined behavior."

Go Compiler

The Go Compiler's job is to take a group of source code files written in the Go programming language and transform them into executable binaries. It has four high-level phases.

  1. Parsing

  2. Type-Checking and AST Transformations

  3. Generic SSA

  4. Lower SSA and Machine Code

Compiler Instrumentation occurs during phases 3 and 4 (During SSA construction and as a separate SSA pass).

The heart of the compiler is located at go/src/cmd/compile/internal/gc/main.go (EVERYTHING BRANCHES FROM THIS FILE -- to fully understand the compiler read through this file, although only the latter half is relevant for this research)


TSAN (ThreadSanitizer)

  • Should understand current approaches to race detection using static and dynamic analysis
  • Concepts: shadow memory, happens-before, vector clocks

Refs:

Some Direct Wiki Links:


TSAN in Go

How is the race detection library implemented in Go?

  • It's not! The runtime state machine and core interface is compiled from the original LLVM project and linked to the Go runtime as an external library.

What is implemented in the Go runtime then?

  • The Go compiler still needs to instrument emitted code for use by the LLVM race detection runtime. This includes standard instrumentation of reads and writes and inserting race functions into new Go-specific synchronization primitives such as channels.

Source Code: LLVM

  • The source code for race detection in LLVM is well compartmentalized.

  • Runtime Library: llvm/projects/compiler-rt/lib/tsan/rtl/

    • racefuncentry/racefuncexit: inserted at the beginning and end (respectively) of functions containing instrumented memory accesses

      • used to restore stack traces
    • raceread/racereadrange

    • racewrite/racewriterange

    • ...

  • LLVM instrumentation: llvm/lib/Transforms/Instrumentation/

    • All instrumentation methods specifically for TSAN are defined in llvm/lib/Transforms/Instrumentation/ThreadSanitizer.cpp:

        Summary:
      
        1. Redundant accesses currently handled:
        	- read-before-write (within same BB, no calls between)
        	- not captured variables
      
        2. Patterns that should not survive after classic compiler optimizations are not handled:
        	- two reads from the same temp should be eliminated by CSE (common subexpression elimination)
        	- two writes should be eliminated by DSE (dead store elimination)
        	- etc. (out-of-scope for this project)
      
        3. Do not instrument known races/"benign races" that come from compiler instrumentation.
        	The user has no way of suppressing them.
      
      see code

        // Instrumenting some of the accesses may be proven redundant.
        // Currently handled:
        //  - read-before-write (within same BB, no calls between)
        //  - not captured variables
        //
        // We do not handle some of the patterns that should not survive
        // after the classic compiler optimizations.
        // E.g. two reads from the same temp should be eliminated by CSE,
        // two writes should be eliminated by DSE, etc.
        //
        // 'Local' is a vector of insns within the same BB (no calls between).
        // 'All' is a vector of insns that will be instrumented.
        void ThreadSanitizer::chooseInstructionsToInstrument(
        	SmallVectorImpl<Instruction *> &Local, SmallVectorImpl<Instruction *> &All,
        	const DataLayout &DL)
        {
        	SmallPtrSet<Value*, 8> WriteTargets;
        	// Iterate from the end.
        	for (Instruction *I : reverse(Local)) {
        		if (StoreInst *Store = dyn_cast<StoreInst>(I)) {
        			Value *Addr = Store->getPointerOperand();
        			if (!shouldInstrumentReadWriteFromAddress(I->getModule(), Addr))
        			continue;
        			WriteTargets.insert(Addr);
        		} else {
        			LoadInst *Load = cast<LoadInst>(I);
        			Value *Addr = Load->getPointerOperand();
        			if (!shouldInstrumentReadWriteFromAddress(I->getModule(), Addr))
        			continue;
        			if (WriteTargets.count(Addr)) {
        			// We will write to this temp, so no reason to analyze the read.
        			NumOmittedReadsBeforeWrite++;
        			continue;
        			}
        			if (addrPointsToConstantData(Addr)) {
        			// Addr points to some constant data -- it can not race with any writes.
        			continue;
        			}
        		}
        		Value *Addr = isa<StoreInst>(*I)
        			? cast<StoreInst>(I)->getPointerOperand()
        			: cast<LoadInst>(I)->getPointerOperand();
        		if (isa<AllocaInst>(GetUnderlyingObject(Addr, DL)) &&
        			!PointerMayBeCaptured(Addr, true, true)) {
        			// The variable is addressable but not captured, so it cannot be
        			// referenced from a different thread and participate in a data race
        			// (see llvm/Analysis/CaptureTracking.h for details).
        			NumOmittedNonCaptured++;
        			continue;
        		}
        		All.push_back(I);
        	}
        	Local.clear();
        }

      see code

        // Do not instrument known races/"benign races" that come from compiler
        // instrumentatin. The user has no way of suppressing them.
        static bool shouldInstrumentReadWriteFromAddress(const Module *M, Value *Addr) {
        	// Peel off GEPs and BitCasts.
        	Addr = Addr->stripInBoundsOffsets();
      
        	if (GlobalVariable *GV = dyn_cast<GlobalVariable>(Addr)) {
        		if (GV->hasSection()) {
        			StringRef SectionName = GV->getSection();
        			// Check if the global is in the PGO counters section.
        			auto OF = Triple(M->getTargetTriple()).getObjectFormat();
        			if (SectionName.endswith(
        					getInstrProfSectionName(IPSK_cnts, OF, /*AddSegmentInfo=*/false)))
        			return false;
        		}
      
        		// Check if the global is private gcov data.
        		if (GV->getName().startswith("__llvm_gcov") ||
        			GV->getName().startswith("__llvm_gcda"))
        			return false;
        	}
      
        	// Do not instrument acesses from different address spaces; we cannot deal
        	// with them.
        	if (Addr) {
        		Type *PtrTy = cast<PointerType>(Addr->getType()->getScalarType());
        		if (PtrTy->getPointerAddressSpace() != 0)
        			return false;
        	}
      
        	return true;
        }


Source Code: Go

  • Go package race for manually instrumenting code for the race detector: go/src/internal/race/race.go (also norace.go for when race detection is not enabled)

    see code

     // SOURCE: go/src/internal/race/race.go
    
     func Acquire(addr unsafe.Pointer) {
     	runtime.RaceAcquire(addr)
     }
    
     func Release(addr unsafe.Pointer) {
     	runtime.RaceRelease(addr)
     }
    
     func ReleaseMerge(addr unsafe.Pointer) {
     	runtime.RaceReleaseMerge(addr)
     }
    
     func Disable() {
     	runtime.RaceDisable()
     }
    
     func Enable() {
     	runtime.RaceEnable()
     }
    
     func Read(addr unsafe.Pointer) {
     	runtime.RaceRead(addr)
     }
    
     func Write(addr unsafe.Pointer) {
     	runtime.RaceWrite(addr)
     }
    
     func ReadRange(addr unsafe.Pointer, len int) {
     	runtime.RaceReadRange(addr, len)
     }
    
     func WriteRange(addr unsafe.Pointer, len int) {
     	runtime.RaceWriteRange(addr, len)
     }

  • Race instrumentation API used by the go compiler: go/src/runtime/race.go (also race0.go when race detection is not enabled)

    see code

     // SOURCE: go/src/runtime/race.go
    
     // Public race detection API, present iff build with -race.
    
     func RaceRead(addr unsafe.Pointer)
     func RaceWrite(addr unsafe.Pointer)
     func RaceReadRange(addr unsafe.Pointer, len int)
     func RaceWriteRange(addr unsafe.Pointer, len int)
    
     func RaceErrors() int {
     	var n uint64
     	racecall(&__tsan_report_count, uintptr(unsafe.Pointer(&n)), 0, 0, 0)
     	return int(n)
     }
    
     //go:nosplit
    
     // RaceAcquire/RaceRelease/RaceReleaseMerge establish happens-before relations
     // between goroutines. These inform the race detector about actual synchronization
     // that it can't see for some reason (e.g. synchronization within RaceDisable/RaceEnable
     // sections of code).
     // RaceAcquire establishes a happens-before relation with the preceding
     // RaceReleaseMerge on addr up to and including the last RaceRelease on addr.
     // In terms of the C memory model (C11 §5.1.2.4, §7.17.3),
     // RaceAcquire is equivalent to atomic_load(memory_order_acquire).
     func RaceAcquire(addr unsafe.Pointer) {
     	raceacquire(addr)
     }
    
     //go:nosplit
    
     // RaceRelease performs a release operation on addr that
     // can synchronize with a later RaceAcquire on addr.
     //
     // In terms of the C memory model, RaceRelease is equivalent to
     // atomic_store(memory_order_release).
     func RaceRelease(addr unsafe.Pointer) {
     	racerelease(addr)
     }
    
     //go:nosplit
    
     // RaceReleaseMerge is like RaceRelease, but also establishes a happens-before
     // relation with the preceding RaceRelease or RaceReleaseMerge on addr.
     //
     // In terms of the C memory model, RaceReleaseMerge is equivalent to
     // atomic_exchange(memory_order_release).
     func RaceReleaseMerge(addr unsafe.Pointer) {
     	racereleasemerge(addr)
     }
    
     //go:nosplit
    
     // RaceDisable disables handling of race synchronization events in the current goroutine.
     // Handling is re-enabled with RaceEnable. RaceDisable/RaceEnable can be nested.
     // Non-synchronization events (memory accesses, function entry/exit) still affect
     // the race detector.
     func RaceDisable() {
     	_g_ := getg()
     	if _g_.raceignore == 0 {
     		racecall(&__tsan_go_ignore_sync_begin, _g_.racectx, 0, 0, 0)
     	}
     	_g_.raceignore++
     }
    
     //go:nosplit
    
     // RaceEnable re-enables handling of race events in the current goroutine.
     func RaceEnable() {
     	_g_ := getg()
     	_g_.raceignore--
     	if _g_.raceignore == 0 {
     		racecall(&__tsan_go_ignore_sync_end, _g_.racectx, 0, 0, 0)
     	}
     }
    
     // Private interface for the runtime.
    
     const raceenabled = true
    
     // For all functions accepting callerpc and pc,
     // callerpc is a return PC of the function that calls this function,
     // pc is start PC of the function that calls this function.
     func raceReadObjectPC(t *_type, addr unsafe.Pointer, callerpc, pc uintptr) {
     	kind := t.kind & kindMask
     	if kind == kindArray || kind == kindStruct {
     		// for composite objects we have to read every address
     		// because a write might happen to any subobject.
     		racereadrangepc(addr, t.size, callerpc, pc)
     	} else {
     		// for non-composite objects we can read just the start
     		// address, as any write must write the first byte.
     		racereadpc(addr, callerpc, pc)
     	}
     }
    
     func raceWriteObjectPC(t *_type, addr unsafe.Pointer, callerpc, pc uintptr) {
     	kind := t.kind & kindMask
     	if kind == kindArray || kind == kindStruct {
     		// for composite objects we have to write every address
     		// because a write might happen to any subobject.
     		racewriterangepc(addr, t.size, callerpc, pc)
     	} else {
     		// for non-composite objects we can write just the start
     		// address, as any write must write the first byte.
     		racewritepc(addr, callerpc, pc)
     	}
     }
    
     //go:noescape
     func racereadpc(addr unsafe.Pointer, callpc, pc uintptr)
    
     //go:noescape
     func racewritepc(addr unsafe.Pointer, callpc, pc uintptr)
    
     type symbolizeCodeContext struct {
     	pc   uintptr
     	fn   *byte
     	file *byte
     	line uintptr
     	off  uintptr
     	res  uintptr
     }
    
     var qq = [...]byte{'?', '?', 0}
     var dash = [...]byte{'-', 0}
    
     const (
     	raceGetProcCmd = iota
     	raceSymbolizeCodeCmd
     	raceSymbolizeDataCmd
     )
    
     // Callback from C into Go, runs on g0.
     func racecallback(cmd uintptr, ctx unsafe.Pointer) {
     	switch cmd {
     	case raceGetProcCmd:
     		throw("should have been handled by racecallbackthunk")
     	case raceSymbolizeCodeCmd:
     		raceSymbolizeCode((*symbolizeCodeContext)(ctx))
     	case raceSymbolizeDataCmd:
     		raceSymbolizeData((*symbolizeDataContext)(ctx))
     	default:
     		throw("unknown command")
     	}
     }
    
     // raceSymbolizeCode reads ctx.pc and populates the rest of *ctx with
     // information about the code at that pc.
     //
     // The race detector has already subtracted 1 from pcs, so they point to the last
     // byte of call instructions (including calls to runtime.racewrite and friends).
     //
     // If the incoming pc is part of an inlined function, *ctx is populated
     // with information about the inlined function, and on return ctx.pc is set
     // to a pc in the logically containing function. (The race detector should call this
     // function again with that pc.)
     //
     // If the incoming pc is not part of an inlined function, the return pc is unchanged.
     func raceSymbolizeCode(ctx *symbolizeCodeContext) {
     	pc := ctx.pc
     	fi := findfunc(pc)
     	f := fi._Func()
     	if f != nil {
     		file, line := f.FileLine(pc)
     		if line != 0 {
     			if inldata := funcdata(fi, _FUNCDATA_InlTree); inldata != nil {
     				inltree := (*[1 << 20]inlinedCall)(inldata)
     				for {
     					ix := pcdatavalue(fi, _PCDATA_InlTreeIndex, pc, nil)
     					if ix >= 0 {
     						if inltree[ix].funcID == funcID_wrapper {
     							// ignore wrappers
     							// Back up to an instruction in the "caller".
     							pc = f.Entry() + uintptr(inltree[ix].parentPc)
     							continue
     						}
     						ctx.pc = f.Entry() + uintptr(inltree[ix].parentPc) // "caller" pc
     						ctx.fn = cfuncnameFromNameoff(fi, inltree[ix].func_)
     						ctx.line = uintptr(line)
     						ctx.file = &bytes(file)[0] // assume NUL-terminated
     						ctx.off = pc - f.Entry()
     						ctx.res = 1
     						return
     					}
     					break
     				}
     			}
     			ctx.fn = cfuncname(fi)
     			ctx.line = uintptr(line)
     			ctx.file = &bytes(file)[0] // assume NUL-terminated
     			ctx.off = pc - f.Entry()
     			ctx.res = 1
     			return
     		}
     	}
     	ctx.fn = &qq[0]
     	ctx.file = &dash[0]
     	ctx.line = 0
     	ctx.off = ctx.pc
     	ctx.res = 1
     }
    
     type symbolizeDataContext struct {
     	addr  uintptr
     	heap  uintptr
     	start uintptr
     	size  uintptr
     	name  *byte
     	file  *byte
     	line  uintptr
     	res   uintptr
     }
    
     func raceSymbolizeData(ctx *symbolizeDataContext) {
     	if base, span, _ := findObject(ctx.addr, 0, 0); base != 0 {
     		ctx.heap = 1
     		ctx.start = base
     		ctx.size = span.elemsize
     		ctx.res = 1
     	}
     }
    
     // Race runtime functions called via runtime·racecall.
     //go:linkname __tsan_init __tsan_init
     var __tsan_init byte
    
     //go:linkname __tsan_fini __tsan_fini
     var __tsan_fini byte
    
     //go:linkname __tsan_proc_create __tsan_proc_create
     var __tsan_proc_create byte
    
     //go:linkname __tsan_proc_destroy __tsan_proc_destroy
     var __tsan_proc_destroy byte
    
     //go:linkname __tsan_map_shadow __tsan_map_shadow
     var __tsan_map_shadow byte
    
     //go:linkname __tsan_finalizer_goroutine __tsan_finalizer_goroutine
     var __tsan_finalizer_goroutine byte
    
     //go:linkname __tsan_go_start __tsan_go_start
     var __tsan_go_start byte
    
     //go:linkname __tsan_go_end __tsan_go_end
     var __tsan_go_end byte
    
     //go:linkname __tsan_malloc __tsan_malloc
     var __tsan_malloc byte
    
     //go:linkname __tsan_free __tsan_free
     var __tsan_free byte
    
     //go:linkname __tsan_acquire __tsan_acquire
     var __tsan_acquire byte
    
     //go:linkname __tsan_release __tsan_release
     var __tsan_release byte
    
     //go:linkname __tsan_release_merge __tsan_release_merge
     var __tsan_release_merge byte
    
     //go:linkname __tsan_go_ignore_sync_begin __tsan_go_ignore_sync_begin
     var __tsan_go_ignore_sync_begin byte
    
     //go:linkname __tsan_go_ignore_sync_end __tsan_go_ignore_sync_end
     var __tsan_go_ignore_sync_end byte
    
     //go:linkname __tsan_report_count __tsan_report_count
     var __tsan_report_count byte
    
     // Mimic what cmd/cgo would do.
     //go:cgo_import_static __tsan_init
     //go:cgo_import_static __tsan_fini
     //go:cgo_import_static __tsan_proc_create
     //go:cgo_import_static __tsan_proc_destroy
     //go:cgo_import_static __tsan_map_shadow
     //go:cgo_import_static __tsan_finalizer_goroutine
     //go:cgo_import_static __tsan_go_start
     //go:cgo_import_static __tsan_go_end
     //go:cgo_import_static __tsan_malloc
     //go:cgo_import_static __tsan_free
     //go:cgo_import_static __tsan_acquire
     //go:cgo_import_static __tsan_release
     //go:cgo_import_static __tsan_release_merge
     //go:cgo_import_static __tsan_go_ignore_sync_begin
     //go:cgo_import_static __tsan_go_ignore_sync_end
     //go:cgo_import_static __tsan_report_count
    
     // These are called from race_amd64.s.
     //go:cgo_import_static __tsan_read
     //go:cgo_import_static __tsan_read_pc
     //go:cgo_import_static __tsan_read_range
     //go:cgo_import_static __tsan_write
     //go:cgo_import_static __tsan_write_pc
     //go:cgo_import_static __tsan_write_range
     //go:cgo_import_static __tsan_func_enter
     //go:cgo_import_static __tsan_func_exit
    
     //go:cgo_import_static __tsan_go_atomic32_load
     //go:cgo_import_static __tsan_go_atomic64_load
     //go:cgo_import_static __tsan_go_atomic32_store
     //go:cgo_import_static __tsan_go_atomic64_store
     //go:cgo_import_static __tsan_go_atomic32_exchange
     //go:cgo_import_static __tsan_go_atomic64_exchange
     //go:cgo_import_static __tsan_go_atomic32_fetch_add
     //go:cgo_import_static __tsan_go_atomic64_fetch_add
     //go:cgo_import_static __tsan_go_atomic32_compare_exchange
     //go:cgo_import_static __tsan_go_atomic64_compare_exchange
    
     // start/end of global data (data+bss).
     var racedatastart uintptr
     var racedataend uintptr
    
     // start/end of heap for race_amd64.s
     var racearenastart uintptr
     var racearenaend uintptr
    
     func racefuncenter(callpc uintptr)
     func racefuncenterfp(fp uintptr)
     func racefuncexit()
     func raceread(addr uintptr)
     func racewrite(addr uintptr)
     func racereadrange(addr, size uintptr)
     func racewriterange(addr, size uintptr)
     func racereadrangepc1(addr, size, pc uintptr)
     func racewriterangepc1(addr, size, pc uintptr)
     func racecallbackthunk(uintptr)
    
     // racecall allows calling an arbitrary function f from C race runtime
     // with up to 4 uintptr arguments.
     func racecall(fn *byte, arg0, arg1, arg2, arg3 uintptr)
    
     // checks if the address has shadow (i.e. heap or data/bss)
     //go:nosplit
     func isvalidaddr(addr unsafe.Pointer) bool {
     	return racearenastart <= uintptr(addr) && uintptr(addr) < racearenaend ||
     		racedatastart <= uintptr(addr) && uintptr(addr) < racedataend
     }
    
     //go:nosplit
     func raceinit() (gctx, pctx uintptr) {
     	// cgo is required to initialize libc, which is used by race runtime
     	if !iscgo {
     		throw("raceinit: race build must use cgo")
     	}
    
     	racecall(&__tsan_init, uintptr(unsafe.Pointer(&gctx)), uintptr(unsafe.Pointer(&pctx)), funcPC(racecallbackthunk), 0)
    
     	// Round data segment to page boundaries, because it's used in mmap().
     	start := ^uintptr(0)
     	end := uintptr(0)
     	if start > firstmoduledata.noptrdata {
     		start = firstmoduledata.noptrdata
     	}
     	if start > firstmoduledata.data {
     		start = firstmoduledata.data
     	}
     	if start > firstmoduledata.noptrbss {
     		start = firstmoduledata.noptrbss
     	}
     	if start > firstmoduledata.bss {
     		start = firstmoduledata.bss
     	}
     	if end < firstmoduledata.enoptrdata {
     		end = firstmoduledata.enoptrdata
     	}
     	if end < firstmoduledata.edata {
     		end = firstmoduledata.edata
     	}
     	if end < firstmoduledata.enoptrbss {
     		end = firstmoduledata.enoptrbss
     	}
     	if end < firstmoduledata.ebss {
     		end = firstmoduledata.ebss
     	}
     	size := alignUp(end-start, _PageSize)
     	racecall(&__tsan_map_shadow, start, size, 0, 0)
     	racedatastart = start
     	racedataend = start + size
    
     	return
     }
    
     var raceFiniLock mutex
    
     //go:nosplit
     func racefini() {
     	// racefini() can only be called once to avoid races.
     	// This eventually (via __tsan_fini) calls C.exit which has
     	// undefined behavior if called more than once. If the lock is
     	// already held it's assumed that the first caller exits the program
     	// so other calls can hang forever without an issue.
     	lock(&raceFiniLock)
     	racecall(&__tsan_fini, 0, 0, 0, 0)
     }
    
     //go:nosplit
     func raceproccreate() uintptr {
     	var ctx uintptr
     	racecall(&__tsan_proc_create, uintptr(unsafe.Pointer(&ctx)), 0, 0, 0)
     	return ctx
     }
    
     //go:nosplit
     func raceprocdestroy(ctx uintptr) {
     	racecall(&__tsan_proc_destroy, ctx, 0, 0, 0)
     }
    
     //go:nosplit
     func racemapshadow(addr unsafe.Pointer, size uintptr) {
     	if racearenastart == 0 {
     		racearenastart = uintptr(addr)
     	}
     	if racearenaend < uintptr(addr)+size {
     		racearenaend = uintptr(addr) + size
     	}
     	racecall(&__tsan_map_shadow, uintptr(addr), size, 0, 0)
     }
    
     //go:nosplit
     func racemalloc(p unsafe.Pointer, sz uintptr) {
     	racecall(&__tsan_malloc, 0, 0, uintptr(p), sz)
     }
    
     //go:nosplit
     func racefree(p unsafe.Pointer, sz uintptr) {
     	racecall(&__tsan_free, uintptr(p), sz, 0, 0)
     }
    
     //go:nosplit
     func racegostart(pc uintptr) uintptr {
     	_g_ := getg()
     	var spawng *g
     	if _g_.m.curg != nil {
     		spawng = _g_.m.curg
     	} else {
     		spawng = _g_
     	}
    
     	var racectx uintptr
     	racecall(&__tsan_go_start, spawng.racectx, uintptr(unsafe.Pointer(&racectx)), pc, 0)
     	return racectx
     }
    
     //go:nosplit
     func racegoend() {
     	racecall(&__tsan_go_end, getg().racectx, 0, 0, 0)
     }
    
     //go:nosplit
     func racectxend(racectx uintptr) {
     	racecall(&__tsan_go_end, racectx, 0, 0, 0)
     }
    
     //go:nosplit
     func racewriterangepc(addr unsafe.Pointer, sz, callpc, pc uintptr) {
     	_g_ := getg()
     	if _g_ != _g_.m.curg {
     		// The call is coming from manual instrumentation of Go code running on g0/gsignal.
     		// Not interesting.
     		return
     	}
     	if callpc != 0 {
     		racefuncenter(callpc)
     	}
     	racewriterangepc1(uintptr(addr), sz, pc)
     	if callpc != 0 {
     		racefuncexit()
     	}
     }
    
     //go:nosplit
     func racereadrangepc(addr unsafe.Pointer, sz, callpc, pc uintptr) {
     	_g_ := getg()
     	if _g_ != _g_.m.curg {
     		// The call is coming from manual instrumentation of Go code running on g0/gsignal.
     		// Not interesting.
     		return
     	}
     	if callpc != 0 {
     		racefuncenter(callpc)
     	}
     	racereadrangepc1(uintptr(addr), sz, pc)
     	if callpc != 0 {
     		racefuncexit()
     	}
     }
    
     //go:nosplit
     func raceacquire(addr unsafe.Pointer) {
     	raceacquireg(getg(), addr)
     }
    
     //go:nosplit
     func raceacquireg(gp *g, addr unsafe.Pointer) {
     	if getg().raceignore != 0 || !isvalidaddr(addr) {
     		return
     	}
     	racecall(&__tsan_acquire, gp.racectx, uintptr(addr), 0, 0)
     }
    
     //go:nosplit
     func raceacquirectx(racectx uintptr, addr unsafe.Pointer) {
     	if !isvalidaddr(addr) {
     		return
     	}
     	racecall(&__tsan_acquire, racectx, uintptr(addr), 0, 0)
     }
    
     //go:nosplit
     func racerelease(addr unsafe.Pointer) {
     	racereleaseg(getg(), addr)
     }
    
     //go:nosplit
     func racereleaseg(gp *g, addr unsafe.Pointer) {
     	if getg().raceignore != 0 || !isvalidaddr(addr) {
     		return
     	}
     	racecall(&__tsan_release, gp.racectx, uintptr(addr), 0, 0)
     }
    
     //go:nosplit
     func racereleasemerge(addr unsafe.Pointer) {
     	racereleasemergeg(getg(), addr)
     }
    
     //go:nosplit
     func racereleasemergeg(gp *g, addr unsafe.Pointer) {
     	if getg().raceignore != 0 || !isvalidaddr(addr) {
     		return
     	}
     	racecall(&__tsan_release_merge, gp.racectx, uintptr(addr), 0, 0)
     }
    
     //go:nosplit
     func racefingo() {
     	racecall(&__tsan_finalizer_goroutine, getg().racectx, 0, 0, 0)
     }
    
     // The declarations below generate ABI wrappers for functions
     // implemented in assembly in this package but declared in another
     // package.
    
     //go:linkname abigen_sync_atomic_LoadInt32 sync/atomic.LoadInt32
     func abigen_sync_atomic_LoadInt32(addr *int32) (val int32)
    
     //go:linkname abigen_sync_atomic_LoadInt64 sync/atomic.LoadInt64
     func abigen_sync_atomic_LoadInt64(addr *int64) (val int64)
    
     //go:linkname abigen_sync_atomic_LoadUint32 sync/atomic.LoadUint32
     func abigen_sync_atomic_LoadUint32(addr *uint32) (val uint32)
    
     //go:linkname abigen_sync_atomic_LoadUint64 sync/atomic.LoadUint64
     func abigen_sync_atomic_LoadUint64(addr *uint64) (val uint64)
    
     //go:linkname abigen_sync_atomic_LoadUintptr sync/atomic.LoadUintptr
     func abigen_sync_atomic_LoadUintptr(addr *uintptr) (val uintptr)
    
     //go:linkname abigen_sync_atomic_LoadPointer sync/atomic.LoadPointer
     func abigen_sync_atomic_LoadPointer(addr *unsafe.Pointer) (val unsafe.Pointer)
    
     //go:linkname abigen_sync_atomic_StoreInt32 sync/atomic.StoreInt32
     func abigen_sync_atomic_StoreInt32(addr *int32, val int32)
    
     //go:linkname abigen_sync_atomic_StoreInt64 sync/atomic.StoreInt64
     func abigen_sync_atomic_StoreInt64(addr *int64, val int64)
    
     //go:linkname abigen_sync_atomic_StoreUint32 sync/atomic.StoreUint32
     func abigen_sync_atomic_StoreUint32(addr *uint32, val uint32)
    
     //go:linkname abigen_sync_atomic_StoreUint64 sync/atomic.StoreUint64
     func abigen_sync_atomic_StoreUint64(addr *uint64, val uint64)
    
     //go:linkname abigen_sync_atomic_SwapInt32 sync/atomic.SwapInt32
     func abigen_sync_atomic_SwapInt32(addr *int32, new int32) (old int32)
    
     //go:linkname abigen_sync_atomic_SwapInt64 sync/atomic.SwapInt64
     func abigen_sync_atomic_SwapInt64(addr *int64, new int64) (old int64)
    
     //go:linkname abigen_sync_atomic_SwapUint32 sync/atomic.SwapUint32
     func abigen_sync_atomic_SwapUint32(addr *uint32, new uint32) (old uint32)
    
     //go:linkname abigen_sync_atomic_SwapUint64 sync/atomic.SwapUint64
     func abigen_sync_atomic_SwapUint64(addr *uint64, new uint64) (old uint64)
    
     //go:linkname abigen_sync_atomic_AddInt32 sync/atomic.AddInt32
     func abigen_sync_atomic_AddInt32(addr *int32, delta int32) (new int32)
    
     //go:linkname abigen_sync_atomic_AddUint32 sync/atomic.AddUint32
     func abigen_sync_atomic_AddUint32(addr *uint32, delta uint32) (new uint32)
    
     //go:linkname abigen_sync_atomic_AddInt64 sync/atomic.AddInt64
     func abigen_sync_atomic_AddInt64(addr *int64, delta int64) (new int64)
    
     //go:linkname abigen_sync_atomic_AddUint64 sync/atomic.AddUint64
     func abigen_sync_atomic_AddUint64(addr *uint64, delta uint64) (new uint64)
    
     //go:linkname abigen_sync_atomic_AddUintptr sync/atomic.AddUintptr
     func abigen_sync_atomic_AddUintptr(addr *uintptr, delta uintptr) (new uintptr)
    
     //go:linkname abigen_sync_atomic_CompareAndSwapInt32 sync/atomic.CompareAndSwapInt32
     func abigen_sync_atomic_CompareAndSwapInt32(addr *int32, old, new int32) (swapped bool)
    
     //go:linkname abigen_sync_atomic_CompareAndSwapInt64 sync/atomic.CompareAndSwapInt64
     func abigen_sync_atomic_CompareAndSwapInt64(addr *int64, old, new int64) (swapped bool)
    
     //go:linkname abigen_sync_atomic_CompareAndSwapUint32 sync/atomic.CompareAndSwapUint32
     func abigen_sync_atomic_CompareAndSwapUint32(addr *uint32, old, new uint32) (swapped bool)
    
     //go:linkname abigen_sync_atomic_CompareAndSwapUint64 sync/atomic.CompareAndSwapUint64
     func abigen_sync_atomic_CompareAndSwapUint64(addr *uint64, old, new uint64) (swapped bool)

  • Actual instrumentation is spread out, integrated into the SSA construction process (by actual, we mean something like prepending a load instruction with the instrumentation function racewrite).

    • go/cmd/compile/internal/gc/racewalk.go:instrument(): marks a function for instrumentation during SSA and appends/prepends function block with racefuncenter/racefuncexit (also contains overview of racewalk "pass" in comments!)

      see code

       // SOURCE: go/cmd/compile/internal/gc/racewalk.go
      
       func instrument(fn *Node) {
       	if fn.Func.Pragma&Norace != 0 {
       		return
       	}
      
       	if !flag_race || !ispkgin(norace_inst_pkgs) {
       		fn.Func.SetInstrumentBody(true)
       	}
      
       	if flag_race {
       		lno := lineno
       		lineno = src.NoXPos
      
       		if thearch.LinkArch.Arch.Family != sys.AMD64 {
       			fn.Func.Enter.Prepend(mkcall("racefuncenterfp", nil, nil))
       			fn.Func.Exit.Append(mkcall("racefuncexit", nil, nil))
       		} else {
      
       			// nodpc is the PC of the caller as extracted by
       			// getcallerpc. We use -widthptr(FP) for x86.
       			// This only works for amd64. This will not
       			// work on arm or others that might support
       			// race in the future.
       			nodpc := nodfp.copy()
       			nodpc.Type = types.Types[TUINTPTR]
       			nodpc.Xoffset = int64(-Widthptr)
       			fn.Func.Dcl = append(fn.Func.Dcl, nodpc)
       			fn.Func.Enter.Prepend(mkcall("racefuncenter", nil, nil, nodpc))
       			fn.Func.Exit.Append(mkcall("racefuncexit", nil, nil))
       		}
       		lineno = lno
       	}
       }

    • go/cmd/compile/internal/gc/ssa.go:instrument(): Instrument memory loads and stores during SSA building

      see code

       // SOURCE: go/cmd/compile/internal/gc/ssa.go
      
       func (s *state) instrument(t *types.Type, addr *ssa.Value, wr bool) {
       	if !s.curfn.Func.InstrumentBody() {
       		return
       	}
      
       	w := t.Size()
       	if w == 0 {
       		return // can't race on zero-sized things
       	}
      
       	if ssa.IsSanitizerSafeAddr(addr) {
       		return
       	}
      
       	var fn *obj.LSym
       	needWidth := false
      
       	if flag_msan {
       		fn = msanread
       		if wr {
       			fn = msanwrite
       		}
       		needWidth = true
       	} else if flag_race && t.NumComponents(types.CountBlankFields) > 1 {
       		// for composite objects we have to write every address
       		// because a write might happen to any subobject.
       		// composites with only one element don't have subobjects, though.
       		fn = racereadrange
       		if wr {
       			fn = racewriterange
       		}
       		needWidth = true
       	} else if flag_race {
       		// for non-composite objects we can write just the start
       		// address, as any write must write the first byte.
       		fn = raceread
       		if wr {
       			fn = racewrite
       		}
       	} else {
       		panic("unreachable")
       	}
      
       	args := []*ssa.Value{addr}
       	if needWidth {
       		args = append(args, s.constInt(types.Types[TUINTPTR], w))
       	}
       	s.rtcall(fn, true, nil, args...)
       }

      This function is called immediately before each load, zero, move, and storeType instruction.

      see code

       // SOURCE: go/cmd/compile/internal/gc/ssa.go
      
       func (s *state) load(t *types.Type, src *ssa.Value) *ssa.Value {
       	s.instrument(t, src, false)
       	return s.rawLoad(t, src)
       }
      
       func (s *state) rawLoad(t *types.Type, src *ssa.Value) *ssa.Value {
       	return s.newValue2(ssa.OpLoad, t, src, s.mem())
       }
      
       // NO instrument here
       func (s *state) store(t *types.Type, dst, val *ssa.Value) {
       	s.vars[&memVar] = s.newValue3A(ssa.OpStore, types.TypeMem, t, dst, val, s.mem())
       }
      
       func (s *state) zero(t *types.Type, dst *ssa.Value) {
       	s.instrument(t, dst, true)
       	store := s.newValue2I(ssa.OpZero, types.TypeMem, t.Size(), dst, s.mem())
       	store.Aux = t
       	s.vars[&memVar] = store
       }
      
       func (s *state) move(t *types.Type, dst, src *ssa.Value) {
       	s.instrument(t, src, false)
       	s.instrument(t, dst, true)
       	store := s.newValue3I(ssa.OpMove, types.TypeMem, t.Size(), dst, src, s.mem())
       	store.Aux = t
       	s.vars[&memVar] = store
       }
      
       // do *left = right for type t.
       func (s *state) storeType(t *types.Type, left, right *ssa.Value, skip skipMask, leftIsStmt bool) {
       	s.instrument(t, left, true)
      
       	if skip == 0 && (!types.Haspointers(t) || ssa.IsStackAddr(left)) {
       		// Known to not have write barrier. Store the whole type.
       		s.vars[&memVar] = s.newValue3Apos(ssa.OpStore, types.TypeMem, t, left, right, s.mem(), leftIsStmt)
       		return
       	}
      
       	// store scalar fields first, so write barrier stores for
       	// pointer fields can be grouped together, and scalar values
       	// don't need to be live across the write barrier call.
       	// TODO: if the writebarrier pass knows how to reorder stores,
       	// we can do a single store here as long as skip==0.
       	s.storeTypeScalars(t, left, right, skip)
       	if skip&skipPtr == 0 && types.Haspointers(t) {
       		s.storeTypePtrs(t, left, right)
       	}
       }

    • go/cmd/compile/internal/ssa/rewrite.go:needRaceCleanup(): this is a special SSA optimization pass for removing unnecessary calls to racefuncentry/racefuncexit (can safely remove from leaf functions with no instrumented memory accesses)

    • go/cmd/compile/internal/ssa/writebarrier.go:IsSanitizerSafeAddr(): provides rules for optimizing instruction instrumentation

      see code

       // SOURCE: go/cmd/compile/internal/ssa/writebarrier.go
      
       // IsSanitizerSafeAddr reports whether v is known to be an address
       // that doesn't need instrumentation.
       func IsSanitizerSafeAddr(v *Value) bool {
       	for v.Op == OpOffPtr || v.Op == OpAddPtr || v.Op == OpPtrIndex || v.Op == OpCopy {
       		v = v.Args[0]
       	}
       	switch v.Op {
       	case OpSP, OpLocalAddr:
       		// Stack addresses are always safe.
       		return true
       	case OpITab, OpStringPtr, OpGetClosurePtr:
       		// Itabs, string data, and closure fields are
       		// read-only once initialized.
       		return true
       	case OpAddr:
       		sym := v.Aux.(*obj.LSym)
       		// TODO(mdempsky): Find a cleaner way to
       		// detect this. It would be nice if we could
       		// test sym.Type==objabi.SRODATA, but we don't
       		// initialize sym.Type until after function
       		// compilation.
       		if strings.HasPrefix(sym.Name, `""..stmp_`) {
       			return true
       		}
       	}
       	return false
       }


Atomic Instructions

  • Package runtime/internal/atomic provides library of atomic primitives in Go.

  • Function headers are defined in go/src/sync/atomic/doc.go, and assembly thunks are defined in go/src/sync/atomic/asm.s, which call functions defined in go/src/runtime/internal/atomic/asm_<arch>.s.

  • When a program is compiled with -race, the function headers are linked through go/src/sync/atomic/races which calls the assembly defined in go/src/runtime/race_<arch>.s.

For example, for CompareAndSwapInt32:

see code

// SOURCE: go/src/runtime/race_amd64.s

// CompareAndSwap
TEXT	sync∕atomic·CompareAndSwapInt32(SB), NOSPLIT, $0-0
	MOVQ	$__tsan_go_atomic32_compare_exchange(SB), AX
	CALL	racecallatomic<>(SB)
	RET

...

// Generic atomic operation implementation.
// AX already contains target function.
// Calls linked TSAN RTL (llvm/projects/compiler-rt/lib/tsan/rtl/tsan_interface_atomic.cc).
TEXT	racecallatomic<>(SB), NOSPLIT, $0-0
	// Trigger SIGSEGV early.
	MOVQ	16(SP), R12
	MOVL	(R12), R13
	// Check that addr is within [arenastart, arenaend) or within [racedatastart, racedataend).
	CMPQ	R12, runtime·racearenastart(SB)
	JB	racecallatomic_data
	CMPQ	R12, runtime·racearenaend(SB)
	JB	racecallatomic_ok
racecallatomic_data:
	CMPQ	R12, runtime·racedatastart(SB)
	JB	racecallatomic_ignore
	CMPQ	R12, runtime·racedataend(SB)
	JAE	racecallatomic_ignore
racecallatomic_ok:
	// Addr is within the good range, call the atomic function.
	get_tls(R12)
	MOVQ	g(R12), R14
	MOVQ	g_racectx(R14), RARG0	// goroutine context
	MOVQ	8(SP), RARG1	// caller pc
	MOVQ	(SP), RARG2	// pc
	LEAQ	16(SP), RARG3	// arguments
	JMP	racecall<>(SB)	// does not return
racecallatomic_ignore:
	// Addr is outside the good range.
	// Call __tsan_go_ignore_sync_begin to ignore synchronization during the atomic op.
	// An attempt to synchronize on the address would cause crash.
	MOVQ	AX, R15	// remember the original function
	MOVQ	$__tsan_go_ignore_sync_begin(SB), AX
	get_tls(R12)
	MOVQ	g(R12), R14
	MOVQ	g_racectx(R14), RARG0	// goroutine context
	CALL	racecall<>(SB)
	MOVQ	R15, AX	// restore the original function
	// Call the atomic function.
	MOVQ	g_racectx(R14), RARG0	// goroutine context
	MOVQ	8(SP), RARG1	// caller pc
	MOVQ	(SP), RARG2	// pc
	LEAQ	16(SP), RARG3	// arguments
	CALL	racecall<>(SB)
	// Call __tsan_go_ignore_sync_end.
	MOVQ	$__tsan_go_ignore_sync_end(SB), AX
	MOVQ	g_racectx(R14), RARG0	// goroutine context
	JMP	racecall<>(SB)


Optimizations

The following is a list of SSA optimization passes performed by the Go compiler, in order of execution.

SOURCE: go/src/cmd/compile/internal/ssa/compile.go

- number lines

- early phielim
	* phi elimination - a phi is redundant if its arguments are all equal. For the purposes of counting, ignore  the phi itself

- early copye

- early deadcode// remove generated dead code to avoid doing pointless work during opt

- short circuit

- decompose args

- decompose user

- pre-opt deadcode

- opt // NB: some generic rules know the name of the opt pass. TODO: split required rules and optimizing rules

- zero arg cse // required to merge OpSB values

- opt deadcode // remove any blocks orphaned during opt

- generic cse

- phiopt

- gcse deadcode // clean out after cse and phiopt

- nilcheckelim

- prove

- fuse plain

- decompose builtin

- softfloat

- late opt// TODO: split required rules and optimizing rules

- dead auto elim

- generic deadcode // remove dead stores, which otherwise mess up store chain

- check bce

- branchelim

- fuse

- dse //dead-store elimination (a dead store is unconditionally followed by another store to the same location, with no intervening load)

- writebarrier // expand write barrier ops

- insert resched checks // insert resched checks in loops.

- lower

- lowered deadcode for cse // deadcode immediately before CSE avoids CSE making dead values live again

- lowered cse

- elim unread autos

- lowered deadcode

- checkLower

- late phielim

- late copyelim

- tighten // move values closer to their uses

- late deadcode

- critical // remove critical edges

- phi tighten // place rematerializable phi args near uses to reduce value lifetimes

- likelyadjust

- layout // schedule blocks

- schedule // schedule values

- late nilcheck

- flagalloc // allocate flags register

- regalloc // allocate int & float registers + stack slots

- loop rotate

- stackframe

- trim // remove empty blocks

Using Compiler Tools

Useful documentation about the Go compiler (for full documentation see https://golang.org/cmd/compile/)

-N
	Disable optimizations.
-S
	Print assembly listing to standard output (code only).
-S -S
	Print assembly listing to standard output (code and data).
-l
	Disable inlining.
-m
	Print optimization decisions.
-race
	Compile with race detector enabled.
  • Emitting Instrumented ASM for Race Detection:

    • Clang: clang -S -masm=intel -fsanitize=thread <source>.go outputs <source>.s

    • Go Compiler: go tool compile -S -race <source>.go > <source>.s outputs <source>.s

      • to view SSA, prepend above with GOSSAFUNC=<func_name>. This will generate ssa.html, which provides an interactive look at each phase in compilation from source code to AST to SSA after each optimization pass.

Discussion

Instrumentation in LLVM Clang vs. in Go

They are almost identical in terms of optimizations. This makes sense because the Go Race Detector is a port of TSAN to Go. It is also maintained by the same key individuals as TSAN. However, the Go race detector does not yet optimize for removing instrumentation of uncaptured pointers (golang/go#19054 (comment)). Uncaptured pointers are guaranteed to be safe as they will not escape the current function. In this case, capturing a pointer means that no part of the value will be returned from the function nor stored in memory elsewhere.