Skip to content

Commit 481c6df

Browse files
committed
io: reduce intermediate allocations in ReadAll and have a smaller final result
Currently, io.ReadAll allocates a significant amount of intermediate memory as it grows its result slice to the size of the input data. This CL aims to reduce the allocated memory. Geomean benchstat results comparing existing io.ReadAll to this CL for a variety of input sizes: │ old | new vs base │ sec/op 132.2µ 66.32µ -49.83% B/op 645.4Ki 324.6Ki -49.70% final-capacity 178.3k 151.3k -15.10% excess-ratio 1.216 1.033 -15.10% The corresponding full benchstat results are below. The input data sizes are a blend of random sizes, power-of-2 sizes, and power-of-10 sizes. This CL reduces intermediate bytes allocated in io.ReadAll by reading via a set of slices of exponentially growing size, and then copying into a final perfectly-sized slice at the end. The current memory allocations impact real uses. For example, in #50774 two real-world reports were ~60% more bytes allocated via io.ReadAll compared to an alternate approach, and also a separate report of ~5x more bytes allocated than the input data size of ~5MiB. Separately, bytes.Buffer.ReadFrom uses a 2x growth strategy, which usually can beat the pre-existing io.ReadAll on total bytes allocated but sometimes not (depending on alignment between exact input data size and growth). That said, bytes.Buffer.ReadFrom usually ends up with more excess memory used in a larger final result than the current io.ReadAll (often significantly more). If we compare bytes.Buffer.ReadFrom to this CL, we also see better geomean overall results reported with this CL: │ bytes.Buffer | io.ReadAll (new) │ sec/op 104.6µ 66.32µ -36.60% B/op 466.9Ki 324.6Ki -30.48% final-capacity 247.4k 151.3k -38.84% excess-ratio 1.688 1.033 -38.84% (Full corresponding benchstat results comparing this CL vs. bytes.Buffer are at https://go.dev/play/p/eqwk2BkaSwJ). One challenge with almost any change of growth strategy for something widely used is there can be a subset of users that benefited more from the old growth approach (e.g., based on their data size aligning particularly well with the old growth), even if the majority of users on average benefit from the new growth approach. To help mitigate that, this CL somewhat follows the old read pattern in its early stages. Here are the full benchstat results comparing the existing io.ReadAll vs. this CL. The standard metrics are included, plus the final result capacity and an excess capacity ratio, which is the final capacity of the result divided by the input data size (so 1.0 is no excess memory present in the result, though due to size class rounding the ratio is usually above 1.0 unless the input data size exactly matches a size class). We consider smaller reported excess capacity to be better for most uses given it means the final allocation puts less pressure on the GC (both in cases when it will almost immediately be garbage in user code, or if for example the final result is held for multiple GC cycles). The input data sizes used in the benchmarks: - Six powers of 10. - Six powers of 2. - Ten random sizes between 1KiB and 100MiB (chosen uniformly on a log scale). - size=300 (so that we have something below 512, which is the initial read size). goos: linux goarch: amd64 pkg: io cpu: AMD EPYC 7B13 │ old │ io.ReadAll (new) │ │ sec/op │ sec/op vs base │ ReadAll/size=300-16 113.0n ± 0% 115.4n ± 2% +2.08% (p=0.005 n=20) ReadAll/size=512-16 295.0n ± 2% 288.7n ± 1% -2.14% (p=0.006 n=20) ReadAll/size=1000-16 549.2n ± 1% 492.8n ± 1% -10.28% (p=0.000 n=20) ReadAll/size=4096-16 3.193µ ± 1% 2.277µ ± 1% -28.70% (p=0.000 n=20) ReadAll/size=6648-16 4.318µ ± 1% 3.100µ ± 1% -28.21% (p=0.000 n=20) ReadAll/size=10000-16 7.771µ ± 1% 4.629µ ± 1% -40.43% (p=0.000 n=20) ReadAll/size=12179-16 7.724µ ± 1% 5.066µ ± 1% -34.42% (p=0.000 n=20) ReadAll/size=16384-16 13.664µ ± 1% 7.309µ ± 1% -46.51% (p=0.000 n=20) ReadAll/size=32768-16 24.07µ ± 2% 14.52µ ± 2% -39.67% (p=0.000 n=20) ReadAll/size=65536-16 43.14µ ± 2% 24.00µ ± 2% -44.37% (p=0.000 n=20) ReadAll/size=80000-16 57.12µ ± 2% 31.28µ ± 2% -45.24% (p=0.000 n=20) ReadAll/size=100000-16 75.08µ ± 2% 38.18µ ± 3% -49.15% (p=0.000 n=20) ReadAll/size=118014-16 76.06µ ± 1% 50.03µ ± 3% -34.22% (p=0.000 n=20) ReadAll/size=131072-16 103.99µ ± 1% 52.31µ ± 2% -49.70% (p=0.000 n=20) ReadAll/size=397601-16 518.1µ ± 6% 204.2µ ± 2% -60.58% (p=0.000 n=20) ReadAll/size=626039-16 934.9µ ± 3% 398.7µ ± 7% -57.35% (p=0.000 n=20) ReadAll/size=1000000-16 1800.3µ ± 8% 651.4µ ± 6% -63.82% (p=0.000 n=20) ReadAll/size=1141838-16 2236.3µ ± 5% 710.2µ ± 5% -68.24% (p=0.000 n=20) ReadAll/size=2414329-16 4.517m ± 3% 1.471m ± 3% -67.43% (p=0.000 n=20) ReadAll/size=5136407-16 8.547m ± 3% 2.060m ± 1% -75.90% (p=0.000 n=20) ReadAll/size=10000000-16 13.303m ± 4% 3.767m ± 4% -71.68% (p=0.000 n=20) ReadAll/size=18285584-16 23.414m ± 2% 6.790m ± 5% -71.00% (p=0.000 n=20) ReadAll/size=67379426-16 55.93m ± 4% 24.50m ± 5% -56.20% (p=0.000 n=20) ReadAll/size=100000000-16 84.61m ± 5% 33.84m ± 5% -60.00% (p=0.000 n=20) geomean 132.2µ 66.32µ -49.83% │ old │ io.ReadAll (new) │ │ B/op │ B/op vs base │ ReadAll/size=300-16 512.0 ± 0% 512.0 ± 0% ~ (p=1.000 n=20) ¹ ReadAll/size=512-16 1.375Ki ± 0% 1.250Ki ± 0% -9.09% (p=0.000 n=20) ReadAll/size=1000-16 2.750Ki ± 0% 2.125Ki ± 0% -22.73% (p=0.000 n=20) ReadAll/size=4096-16 17.00Ki ± 0% 10.12Ki ± 0% -40.44% (p=0.000 n=20) ReadAll/size=6648-16 23.75Ki ± 0% 15.75Ki ± 0% -33.68% (p=0.000 n=20) ReadAll/size=10000-16 45.00Ki ± 0% 23.88Ki ± 0% -46.94% (p=0.000 n=20) ReadAll/size=12179-16 45.00Ki ± 0% 25.88Ki ± 0% -42.50% (p=0.000 n=20) ReadAll/size=16384-16 82.25Ki ± 0% 36.88Ki ± 0% -55.17% (p=0.000 n=20) ReadAll/size=32768-16 150.25Ki ± 0% 78.88Ki ± 0% -47.50% (p=0.000 n=20) ReadAll/size=65536-16 278.3Ki ± 0% 134.9Ki ± 0% -51.53% (p=0.000 n=20) ReadAll/size=80000-16 374.3Ki ± 0% 190.9Ki ± 0% -49.00% (p=0.000 n=20) ReadAll/size=100000-16 502.3Ki ± 0% 214.9Ki ± 0% -57.22% (p=0.000 n=20) ReadAll/size=118014-16 502.3Ki ± 0% 286.9Ki ± 0% -42.88% (p=0.000 n=20) ReadAll/size=131072-16 670.3Ki ± 0% 294.9Ki ± 0% -56.01% (p=0.000 n=20) ReadAll/size=397601-16 1934.3Ki ± 0% 919.8Ki ± 0% -52.45% (p=0.000 n=20) ReadAll/size=626039-16 3.092Mi ± 0% 1.359Mi ± 0% -56.04% (p=0.000 n=20) ReadAll/size=1000000-16 4.998Mi ± 0% 2.086Mi ± 0% -58.27% (p=0.000 n=20) ReadAll/size=1141838-16 6.334Mi ± 0% 2.219Mi ± 0% -64.98% (p=0.000 n=20) ReadAll/size=2414329-16 12.725Mi ± 0% 4.789Mi ± 0% -62.37% (p=0.000 n=20) ReadAll/size=5136407-16 25.28Mi ± 0% 10.44Mi ± 0% -58.71% (p=0.000 n=20) ReadAll/size=10000000-16 49.84Mi ± 0% 21.92Mi ± 0% -56.02% (p=0.000 n=20) ReadAll/size=18285584-16 97.88Mi ± 0% 35.99Mi ± 0% -63.23% (p=0.000 n=20) ReadAll/size=67379426-16 375.2Mi ± 0% 158.0Mi ± 0% -57.91% (p=0.000 n=20) ReadAll/size=100000000-16 586.7Mi ± 0% 235.9Mi ± 0% -59.80% (p=0.000 n=20) geomean 645.4Ki 324.6Ki -49.70% │ old │ io.ReadAll (new) │ │ final-cap │ final-cap vs base │ ReadAll/size=300-16 512.0 ± 0% 512.0 ± 0% ~ (p=1.000 n=20) ¹ ReadAll/size=512-16 896.0 ± 0% 512.0 ± 0% -42.86% (p=0.000 n=20) ReadAll/size=1000-16 1.408k ± 0% 1.024k ± 0% -27.27% (p=0.000 n=20) ReadAll/size=4096-16 5.376k ± 0% 4.096k ± 0% -23.81% (p=0.000 n=20) ReadAll/size=6648-16 6.912k ± 0% 6.784k ± 0% -1.85% (p=0.000 n=20) ReadAll/size=10000-16 12.29k ± 0% 10.24k ± 0% -16.67% (p=0.000 n=20) ReadAll/size=12179-16 12.29k ± 0% 12.29k ± 0% ~ (p=1.000 n=20) ¹ ReadAll/size=16384-16 21.76k ± 0% 16.38k ± 0% -24.71% (p=0.000 n=20) ReadAll/size=32768-16 40.96k ± 0% 32.77k ± 0% -20.00% (p=0.000 n=20) ReadAll/size=65536-16 73.73k ± 0% 65.54k ± 0% -11.11% (p=0.000 n=20) ReadAll/size=80000-16 98.30k ± 0% 81.92k ± 0% -16.67% (p=0.000 n=20) ReadAll/size=100000-16 131.1k ± 0% 106.5k ± 0% -18.75% (p=0.000 n=20) ReadAll/size=118014-16 131.1k ± 0% 122.9k ± 0% -6.25% (p=0.000 n=20) ReadAll/size=131072-16 172.0k ± 0% 131.1k ± 0% -23.81% (p=0.000 n=20) ReadAll/size=397601-16 442.4k ± 0% 401.4k ± 0% -9.26% (p=0.000 n=20) ReadAll/size=626039-16 704.5k ± 0% 630.8k ± 0% -10.47% (p=0.000 n=20) ReadAll/size=1000000-16 1.114M ± 0% 1.008M ± 0% -9.56% (p=0.000 n=20) ReadAll/size=1141838-16 1.401M ± 0% 1.147M ± 0% -18.13% (p=0.000 n=20) ReadAll/size=2414329-16 2.753M ± 0% 2.417M ± 0% -12.20% (p=0.000 n=20) ReadAll/size=5136407-16 5.399M ± 0% 5.145M ± 0% -4.70% (p=0.000 n=20) ReadAll/size=10000000-16 10.56M ± 0% 10.00M ± 0% -5.28% (p=0.000 n=20) ReadAll/size=18285584-16 20.65M ± 0% 18.29M ± 0% -11.42% (p=0.000 n=20) ReadAll/size=67379426-16 78.84M ± 0% 67.39M ± 0% -14.53% (p=0.000 n=20) ReadAll/size=100000000-16 123.2M ± 0% 100.0M ± 0% -18.82% (p=0.000 n=20) geomean 178.3k 151.3k -15.10% │ old │ io.ReadAll (new) │ │ excess-ratio │ excess-ratio vs base │ ReadAll/size=300-16 1.707 ± 0% 1.707 ± 0% ~ (p=1.000 n=20) ¹ ReadAll/size=512-16 1.750 ± 0% 1.000 ± 0% -42.86% (p=0.000 n=20) ReadAll/size=1000-16 1.408 ± 0% 1.024 ± 0% -27.27% (p=0.000 n=20) ReadAll/size=4096-16 1.312 ± 0% 1.000 ± 0% -23.78% (p=0.000 n=20) ReadAll/size=6648-16 1.040 ± 0% 1.020 ± 0% -1.92% (p=0.000 n=20) ReadAll/size=10000-16 1.229 ± 0% 1.024 ± 0% -16.68% (p=0.000 n=20) ReadAll/size=12179-16 1.009 ± 0% 1.009 ± 0% ~ (p=1.000 n=20) ¹ ReadAll/size=16384-16 1.328 ± 0% 1.000 ± 0% -24.70% (p=0.000 n=20) ReadAll/size=32768-16 1.250 ± 0% 1.000 ± 0% -20.00% (p=0.000 n=20) ReadAll/size=65536-16 1.125 ± 0% 1.000 ± 0% -11.11% (p=0.000 n=20) ReadAll/size=80000-16 1.229 ± 0% 1.024 ± 0% -16.68% (p=0.000 n=20) ReadAll/size=100000-16 1.311 ± 0% 1.065 ± 0% -18.76% (p=0.000 n=20) ReadAll/size=118014-16 1.111 ± 0% 1.041 ± 0% -6.30% (p=0.000 n=20) ReadAll/size=131072-16 1.312 ± 0% 1.000 ± 0% -23.78% (p=0.000 n=20) ReadAll/size=397601-16 1.113 ± 0% 1.010 ± 0% -9.25% (p=0.000 n=20) ReadAll/size=626039-16 1.125 ± 0% 1.008 ± 0% -10.40% (p=0.000 n=20) ReadAll/size=1000000-16 1.114 ± 0% 1.008 ± 0% -9.52% (p=0.000 n=20) ReadAll/size=1141838-16 1.227 ± 0% 1.004 ± 0% -18.17% (p=0.000 n=20) ReadAll/size=2414329-16 1.140 ± 0% 1.001 ± 0% -12.19% (p=0.000 n=20) ReadAll/size=5136407-16 1.051 ± 0% 1.002 ± 0% -4.66% (p=0.000 n=20) ReadAll/size=10000000-16 1.056 ± 0% 1.000 ± 0% -5.30% (p=0.000 n=20) ReadAll/size=18285584-16 1.129 ± 0% 1.000 ± 0% -11.43% (p=0.000 n=20) ReadAll/size=67379426-16 1.170 ± 0% 1.000 ± 0% -14.53% (p=0.000 n=20) ReadAll/size=100000000-16 1.232 ± 0% 1.000 ± 0% -18.83% (p=0.000 n=20) geomean 1.216 1.033 -15.10% │ io.ReadAll │ io.ReadAll (new) │ │ allocs/op │ allocs/op vs base │ ReadAll/size=300-16 1.000 ± 0% 1.000 ± 0% ~ (p=1.000 n=20) ¹ ReadAll/size=512-16 2.000 ± 0% 3.000 ± 0% +50.00% (p=0.000 n=20) ReadAll/size=1000-16 3.000 ± 0% 4.000 ± 0% +33.33% (p=0.000 n=20) ReadAll/size=4096-16 7.000 ± 0% 9.000 ± 0% +28.57% (p=0.000 n=20) ReadAll/size=6648-16 8.000 ± 0% 10.000 ± 0% +25.00% (p=0.000 n=20) ReadAll/size=10000-16 10.00 ± 0% 11.00 ± 0% +10.00% (p=0.000 n=20) ReadAll/size=12179-16 10.00 ± 0% 11.00 ± 0% +10.00% (p=0.000 n=20) ReadAll/size=16384-16 12.00 ± 0% 13.00 ± 0% +8.33% (p=0.000 n=20) ReadAll/size=32768-16 14.00 ± 0% 15.00 ± 0% +7.14% (p=0.000 n=20) ReadAll/size=65536-16 16.00 ± 0% 16.00 ± 0% ~ (p=1.000 n=20) ¹ ReadAll/size=80000-16 17.00 ± 0% 17.00 ± 0% ~ (p=1.000 n=20) ¹ ReadAll/size=100000-16 18.00 ± 0% 17.00 ± 0% -5.56% (p=0.000 n=20) ReadAll/size=118014-16 18.00 ± 0% 18.00 ± 0% ~ (p=1.000 n=20) ¹ ReadAll/size=131072-16 19.00 ± 0% 18.00 ± 0% -5.26% (p=0.000 n=20) ReadAll/size=397601-16 23.00 ± 0% 22.00 ± 0% -4.35% (p=0.000 n=20) ReadAll/size=626039-16 25.00 ± 0% 23.00 ± 0% -8.00% (p=0.000 n=20) ReadAll/size=1000000-16 27.00 ± 0% 24.00 ± 0% -11.11% (p=0.000 n=20) ReadAll/size=1141838-16 28.00 ± 0% 24.00 ± 0% -14.29% (p=0.000 n=20) ReadAll/size=2414329-16 31.00 ± 0% 26.00 ± 0% -16.13% (p=0.000 n=20) ReadAll/size=5136407-16 34.00 ± 0% 28.00 ± 0% -17.65% (p=0.000 n=20) ReadAll/size=10000000-16 37.00 ± 0% 30.00 ± 0% -18.92% (p=0.000 n=20) ReadAll/size=18285584-16 40.00 ± 0% 31.00 ± 0% -22.50% (p=0.000 n=20) ReadAll/size=67379426-16 46.00 ± 0% 35.00 ± 0% -23.91% (p=0.000 n=20) ReadAll/size=100000000-16 48.00 ± 0% 36.00 ± 0% -25.00% (p=0.000 n=20) geomean 14.89 14.65 -1.65% Finally, the read size in this CL currently grows exponentially at a 1.5x growth rate. The old approach had its read size grow at a ~1.25x growth rate once the reads are larger. We could consider for example using a 1.25x read size growth rate here as well. There are perhaps some ~mild trade-offs. One benefit might be something like a ~5% smaller peak live heap contribution (at the end, when copying into the final result) if we used a 1.25x read growth instead of 1.5x read growth. That said, for some systems, larger read sizes can trigger higher throughput behavior further down the stack or elsewhere in a system, such as via better read-ahead behavior, larger transfer sizes, etc. I've observed this effect in various real-world systems, including distributed systems as well as for example with spinning platters (which are still widely used, including backing various "Internet scale" systems). When the effect exists, it is usually substantial. Therefore, my guess is it is better to get to larger read sizes faster, which is one reason the CL is using 1.5x read size growth rate instead of 1.25x. Also, for something like peak live heap contribution, we are already getting substantial wins in this CL for total heap bytes allocated, so maybe that is OK. (I have the actual benchmark in a separate commit, which I can send later, or I can update this CL if preferred). Fixes #50774 Updates #74299 Change-Id: I65eabf1d83a00fbdbe42e4c697116955f8251740 Reviewed-on: https://go-review.googlesource.com/c/go/+/722500 LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Michael Pratt <mpratt@google.com> Reviewed-by: Keith Randall <khr@google.com> Reviewed-by: Keith Randall <khr@golang.org> Reviewed-by: Dmitri Shuralyov <dmitshur@google.com>
1 parent cec4d43 commit 481c6df

File tree

1 file changed

+28
-4
lines changed

1 file changed

+28
-4
lines changed

‎src/io/io.go‎

Lines changed: 28 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -707,20 +707,44 @@ func (c nopCloserWriterTo) WriteTo(w Writer) (n int64, err error) {
707707
// defined to read from src until EOF, it does not treat an EOF from Read
708708
// as an error to be reported.
709709
func ReadAll(r Reader) ([]byte, error) {
710+
// Build slices of exponentially growing size,
711+
// then copy into a perfectly-sized slice at the end.
710712
b := make([]byte, 0, 512)
713+
// Starting with next equal to 256 (instead of say 512 or 1024)
714+
// allows less memory usage for small inputs that finish in the
715+
// early growth stages, but we grow the read sizes quickly such that
716+
// it does not materially impact medium or large inputs.
717+
next := 256
718+
chunks := make([][]byte, 0, 4)
719+
// Invariant: finalSize = sum(len(c) for c in chunks)
720+
var finalSize int
711721
for {
712722
n, err := r.Read(b[len(b):cap(b)])
713723
b = b[:len(b)+n]
714724
if err != nil {
715725
if err == EOF {
716726
err = nil
717727
}
718-
return b, err
728+
if len(chunks) == 0 {
729+
return b, err
730+
}
731+
732+
// Build our final right-sized slice.
733+
finalSize += len(b)
734+
final := append([]byte(nil), make([]byte, finalSize)...)[:0]
735+
for _, chunk := range chunks {
736+
final = append(final, chunk...)
737+
}
738+
final = append(final, b...)
739+
return final, err
719740
}
720741

721-
if len(b) == cap(b) {
722-
// Add more capacity (let append pick how much).
723-
b = append(b, 0)[:len(b)]
742+
if cap(b)-len(b) < cap(b)/16 {
743+
// Move to the next intermediate slice.
744+
chunks = append(chunks, b)
745+
finalSize += len(b)
746+
b = append([]byte(nil), make([]byte, next)...)[:0]
747+
next += next / 2
724748
}
725749
}
726750
}

0 commit comments

Comments
 (0)