Skip to content

runtime: race between stack shrinking and channel send/recv leads to bad sudog values #40641

@mknyszek

Description

@mknyszek

Internally we've seen a rare crash arise in the runtime since Go 1.14. The error message is typically sudog with non-nil elem stemming from a call to releaseSudog from chansend or chanrecv.

The issue here is a race between a mark worker and a channel operation. Consider the following sequence of events. GW is a worker G. GS is a G trying to send on a channel. GR is a G trying to receive on that same channel.

  1. GW wants to suspend GS to scan its stack. It calls suspendG.
  2. GS is about to gopark in chansend. It calls into gopark, and changes its status to _Gwaiting BEFORE calling its unlockf, which sets gp.activeStackChans.
  3. GW observes _Gwaiting and returns from suspendG. It continues into scanstack where it checks if it's safe to shrink the stack. In this case, it's fine. So, it reads gp.activeStackChans, and sees it as false. It begins adjusting sudog pointers without synchronization. It reads the sudog's elem pointer from the chansend, but has not written it back yet.
  4. GS continues on its merry way and sets gp.activeStackChans and parks. It doesn't really matter when this happens at this point.
  5. GR comes in and wants to chanrecv on channel. It grabs the channel lock, reads from the sudog's elem field, and clears it. GR readies GS.
  6. GW then writes the updated sudog's elem pointer and continues on its merry way.
  7. Sometime later, GS wakes up because it was readied by GR, and tries to release the sudog, which has a non-nil elem field.

The fix here, I believe, is to set gp.activeStackChans before the unlockf is called. Doing this ensures that the value is updated before any worker that could shrink GS's stack observes a useful G status in suspendG. This could alternatively be fixed by changing the G status after unlockf is called, but I worry that will break a lot of things.

CC @aclements @prattmic

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions