-
Notifications
You must be signed in to change notification settings - Fork 18.7k
Description
The following sequence of events can lead to an out-of-bounds access attempt in the runtime:
- A SIGPROF comes in on a thread while the G on that thread is in _Gsyscall. The sigprof handler calls gentraceback, which saves a local copy of the G's stkbar slice. Currently the G has no stack barriers, so this slice is empty.
- On another thread, the GC concurrently scans the stack of the goroutine being profiled (it considers it stopped because it's in _Gsyscall) and installs stack barriers.
- Back on the sigprof thread, gentraceback comes across a stack barrier in the stack and attempts to look it up in its (zero length) copy of G's old stkbar slice and attempts an out-of-bounds access.
Because of the particularly prickly context, this double faults and turns into a "panic: fatal error: malloc deadlock".
I can reproduce this ~1 in 10 runs by applying https://go-review.googlesource.com/12674 and running
cd $GOROOT/src/runtime/pprof
go test -c
stress ./pprof.test -test.v -test.short
This should have nothing to do with CL 12647, but applying the CL makes it easy to reproduce (presumably because of some effect on timings).
I'm not sure what the solution to this is. We already have a few cases where we just give up when we're walking the stack for a profile. We could do that here, too, if gentraceback encounters a stack barrier it wasn't expecting. Alternatively, we could make sigprof pickier about Gs in _Gsyscall, though I'm not sure how exactly.