-
Notifications
You must be signed in to change notification settings - Fork 18.7k
Description
It's been brought to my attention by the Fuchsia team that in cases where the Go heap is very discontiguous, the documented invariant for (*pageAlloc).searchAddr may be violated, leading to errant segfaults.
Consider the following case.
n = 0 1 2 ... i-1 i i+1 ...
summary[0][n].max() = 0 0 0 ... 0 1 100 ...
Suppose searchAddr points into the region described by summary i-1, and on the last page allocation, it was exhausted. An allocation then comes in looking for, say, 12 consecutive pages. (*pageAlloc).find will look at summary i, and skip over it, but will set the "free window" to the first page of the region described by summary i, which need not be mapped.
In a contiguous heap we're unlikely to run into this case because the first page of i is likely to be mapped, so the summary would be there.
Most operating systems, even when not following the address hints provided to sysReserve in the runtime, tend to keep the heap contiguous. Fuchsia specifically goes out of its way to map memory discontinuously unless a flag (ZX_VM_COMPACT) is passed to the equivalent function.
This is a real problem, even for existing systems. Note that the above scenario need not happen for summary level 0, it can happen at any level (say there's a small hole in the address space due to a region being reserved by the OS or something and we allocate virtual address space around it).
Fixing this problem is not straight-forward without some kind of performance consequences (though it's unclear to me how big those consequences will be).
Fundamentally, this issue arises because we try to update searchAddr "on the way" to allocating memory which may not be the first free page. One simplifying solution is to simply not do that. It means that there will need to be more iteration when allocating, and we're less likely to take the fast path, but it will be correct.
Another option is to actually find the first piece of mapped memory in that region, which would be possible by iterating over (*pageAlloc).inUse (inUse will be large, but the search could be turned into a binary search as per the comment in addrRanges.findSucc). We could also find it by walking down the summaries, but this is unnecessary and is basically like doing the find operation twice, which is a bit wasteful.
A third option is to remove the invariant on searchAddr altogether and just be a lot more strict in all the places where we try to use searchAddr to access summaries directly (mostly on fast paths in the page allocator and scavenger). The trouble with this is that the only efficient way to verify that searchAddr is valid (short of walking the tree, which we're hoping to avoid in these cases!) is to check inUse, but that's not exactly efficient either. In this case, it would likely be better to just always move searchAddr to the right location on the slow path by checking inUse.
My vote is for option 2. It's not terribly complicated, and the change is local to just the find operation. Option 1 might be preferred anyway just because it simplifies find a bunch, though.
This issue should not be considered a blocking issue for Go 1.15 because the bug was technically introduced in Go 1.14, but it should be fixed and probably backported too, since it can cause failures with no workaround.