Skip to content

Question: Potential false negatives in Go race detection due to shadow eviction and thread slot recycling #173049

@kolkov

Description

@kolkov

Hi TSAN team,

We're developing a pure-Go race detector (racedetector) and during comparative testing discovered behavioral differences that may indicate potential false negatives in TSAN's Go integration.

Observed Behavior

When testing concurrent Go programs, our detector reports races that TSAN misses. The discrepancy is reproducible and GOMAXPROCS-dependent.

Test case:

package main

import "sync"

//go:noinline
func update(ptr *int) {
    *ptr++
}

func main() {
    var shared int
    var wg sync.WaitGroup
    for range 2 {
        wg.Add(1)
        go func() {
            defer wg.Done()
            for range 100 {
                update(&shared)
            }
        }()
    }
    wg.Wait()
}
  • Our detector: Reports race on shared variable
  • Go's -race (TSAN): No race reported

Potential Root Causes (from source analysis)

We analyzed compiler-rt/lib/tsan and identified several mechanisms that may contribute:

1. Shadow Memory Eviction (tsan_defs.h:58)

constexpr uptr kShadowCnt = 4;  // 4 shadow cells per 8 bytes

Under high contention, older access records may be evicted before a race is detected.

2. Thread Slot Limit (tsan_defs.h:57)

constexpr uptr kThreadSlotCount = 256;

Programs with >256 goroutines experience slot recycling, potentially losing happens-before information.

3. Conditional Fork Happens-Before (tsan_rtl_thread.cpp:140-145)

if (!thr->ignore_sync) {
    thr->clock.ReleaseStore(&arg.sync);
}

When ignore_sync is active, child threads may not inherit parent's vector clock.

4. Go Runtime ignore_sync (tsan_go.cpp:279-283)

void __tsan_go_ignore_sync_begin(ThreadState *thr) { ... }
void __tsan_go_ignore_sync_end(ThreadState *thr) { ... }

Go runtime can disable synchronization tracking for internal operations, but this may inadvertently affect user-code race detection.

Questions

  1. Are these known limitations documented somewhere?
  2. Is the ignore_sync behavior during goroutine creation intentional?
  3. Would increasing kShadowCnt or kThreadSlotCount be beneficial for Go workloads?
  4. Would you be interested in reviewing our implementation for comparison?

Our Implementation Differences

Aspect TSAN Our Detector
Thread slots 256 65,536
Shadow cells 4 per 8 bytes 1 per address (CAS-based)
Fork HB Conditional Unconditional
ignore_sync Yes No

References

We're not claiming TSAN has bugs — these may be intentional performance tradeoffs. We'd appreciate any guidance on expected behavior.

Thank you for TSAN — it's an invaluable tool that inspired our work.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions