Commit 481c6df
committed
io: reduce intermediate allocations in ReadAll and have a smaller final result
Currently, io.ReadAll allocates a significant amount of intermediate
memory as it grows its result slice to the size of the input data.
This CL aims to reduce the allocated memory. Geomean benchstat results
comparing existing io.ReadAll to this CL for a variety of input sizes:
│ old | new vs base │
sec/op 132.2µ 66.32µ -49.83%
B/op 645.4Ki 324.6Ki -49.70%
final-capacity 178.3k 151.3k -15.10%
excess-ratio 1.216 1.033 -15.10%
The corresponding full benchstat results are below. The input data sizes
are a blend of random sizes, power-of-2 sizes, and power-of-10 sizes.
This CL reduces intermediate bytes allocated in io.ReadAll by reading
via a set of slices of exponentially growing size, and then copying
into a final perfectly-sized slice at the end.
The current memory allocations impact real uses. For example, in #50774
two real-world reports were ~60% more bytes allocated via io.ReadAll
compared to an alternate approach, and also a separate report of
~5x more bytes allocated than the input data size of ~5MiB.
Separately, bytes.Buffer.ReadFrom uses a 2x growth strategy, which
usually can beat the pre-existing io.ReadAll on total bytes allocated
but sometimes not (depending on alignment between exact input data size
and growth). That said, bytes.Buffer.ReadFrom usually ends up with
more excess memory used in a larger final result than the current
io.ReadAll (often significantly more).
If we compare bytes.Buffer.ReadFrom to this CL, we also see
better geomean overall results reported with this CL:
│ bytes.Buffer | io.ReadAll (new) │
sec/op 104.6µ 66.32µ -36.60%
B/op 466.9Ki 324.6Ki -30.48%
final-capacity 247.4k 151.3k -38.84%
excess-ratio 1.688 1.033 -38.84%
(Full corresponding benchstat results comparing this CL vs. bytes.Buffer
are at https://go.dev/play/p/eqwk2BkaSwJ).
One challenge with almost any change of growth strategy for something
widely used is there can be a subset of users that benefited more from
the old growth approach (e.g., based on their data size aligning
particularly well with the old growth), even if the majority of users
on average benefit from the new growth approach.
To help mitigate that, this CL somewhat follows the old read pattern
in its early stages.
Here are the full benchstat results comparing the existing
io.ReadAll vs. this CL. The standard metrics are included, plus
the final result capacity and an excess capacity ratio, which is
the final capacity of the result divided by the input data size (so 1.0
is no excess memory present in the result, though due to
size class rounding the ratio is usually above 1.0 unless the
input data size exactly matches a size class).
We consider smaller reported excess capacity to be better for most
uses given it means the final allocation puts less pressure on the GC
(both in cases when it will almost immediately be garbage in user code,
or if for example the final result is held for multiple GC cycles).
The input data sizes used in the benchmarks:
- Six powers of 10.
- Six powers of 2.
- Ten random sizes between 1KiB and 100MiB (chosen uniformly
on a log scale).
- size=300 (so that we have something below 512, which is the
initial read size).
goos: linux
goarch: amd64
pkg: io
cpu: AMD EPYC 7B13
│ old │ io.ReadAll (new) │
│ sec/op │ sec/op vs base │
ReadAll/size=300-16 113.0n ± 0% 115.4n ± 2% +2.08% (p=0.005 n=20)
ReadAll/size=512-16 295.0n ± 2% 288.7n ± 1% -2.14% (p=0.006 n=20)
ReadAll/size=1000-16 549.2n ± 1% 492.8n ± 1% -10.28% (p=0.000 n=20)
ReadAll/size=4096-16 3.193µ ± 1% 2.277µ ± 1% -28.70% (p=0.000 n=20)
ReadAll/size=6648-16 4.318µ ± 1% 3.100µ ± 1% -28.21% (p=0.000 n=20)
ReadAll/size=10000-16 7.771µ ± 1% 4.629µ ± 1% -40.43% (p=0.000 n=20)
ReadAll/size=12179-16 7.724µ ± 1% 5.066µ ± 1% -34.42% (p=0.000 n=20)
ReadAll/size=16384-16 13.664µ ± 1% 7.309µ ± 1% -46.51% (p=0.000 n=20)
ReadAll/size=32768-16 24.07µ ± 2% 14.52µ ± 2% -39.67% (p=0.000 n=20)
ReadAll/size=65536-16 43.14µ ± 2% 24.00µ ± 2% -44.37% (p=0.000 n=20)
ReadAll/size=80000-16 57.12µ ± 2% 31.28µ ± 2% -45.24% (p=0.000 n=20)
ReadAll/size=100000-16 75.08µ ± 2% 38.18µ ± 3% -49.15% (p=0.000 n=20)
ReadAll/size=118014-16 76.06µ ± 1% 50.03µ ± 3% -34.22% (p=0.000 n=20)
ReadAll/size=131072-16 103.99µ ± 1% 52.31µ ± 2% -49.70% (p=0.000 n=20)
ReadAll/size=397601-16 518.1µ ± 6% 204.2µ ± 2% -60.58% (p=0.000 n=20)
ReadAll/size=626039-16 934.9µ ± 3% 398.7µ ± 7% -57.35% (p=0.000 n=20)
ReadAll/size=1000000-16 1800.3µ ± 8% 651.4µ ± 6% -63.82% (p=0.000 n=20)
ReadAll/size=1141838-16 2236.3µ ± 5% 710.2µ ± 5% -68.24% (p=0.000 n=20)
ReadAll/size=2414329-16 4.517m ± 3% 1.471m ± 3% -67.43% (p=0.000 n=20)
ReadAll/size=5136407-16 8.547m ± 3% 2.060m ± 1% -75.90% (p=0.000 n=20)
ReadAll/size=10000000-16 13.303m ± 4% 3.767m ± 4% -71.68% (p=0.000 n=20)
ReadAll/size=18285584-16 23.414m ± 2% 6.790m ± 5% -71.00% (p=0.000 n=20)
ReadAll/size=67379426-16 55.93m ± 4% 24.50m ± 5% -56.20% (p=0.000 n=20)
ReadAll/size=100000000-16 84.61m ± 5% 33.84m ± 5% -60.00% (p=0.000 n=20)
geomean 132.2µ 66.32µ -49.83%
│ old │ io.ReadAll (new) │
│ B/op │ B/op vs base │
ReadAll/size=300-16 512.0 ± 0% 512.0 ± 0% ~ (p=1.000 n=20) ¹
ReadAll/size=512-16 1.375Ki ± 0% 1.250Ki ± 0% -9.09% (p=0.000 n=20)
ReadAll/size=1000-16 2.750Ki ± 0% 2.125Ki ± 0% -22.73% (p=0.000 n=20)
ReadAll/size=4096-16 17.00Ki ± 0% 10.12Ki ± 0% -40.44% (p=0.000 n=20)
ReadAll/size=6648-16 23.75Ki ± 0% 15.75Ki ± 0% -33.68% (p=0.000 n=20)
ReadAll/size=10000-16 45.00Ki ± 0% 23.88Ki ± 0% -46.94% (p=0.000 n=20)
ReadAll/size=12179-16 45.00Ki ± 0% 25.88Ki ± 0% -42.50% (p=0.000 n=20)
ReadAll/size=16384-16 82.25Ki ± 0% 36.88Ki ± 0% -55.17% (p=0.000 n=20)
ReadAll/size=32768-16 150.25Ki ± 0% 78.88Ki ± 0% -47.50% (p=0.000 n=20)
ReadAll/size=65536-16 278.3Ki ± 0% 134.9Ki ± 0% -51.53% (p=0.000 n=20)
ReadAll/size=80000-16 374.3Ki ± 0% 190.9Ki ± 0% -49.00% (p=0.000 n=20)
ReadAll/size=100000-16 502.3Ki ± 0% 214.9Ki ± 0% -57.22% (p=0.000 n=20)
ReadAll/size=118014-16 502.3Ki ± 0% 286.9Ki ± 0% -42.88% (p=0.000 n=20)
ReadAll/size=131072-16 670.3Ki ± 0% 294.9Ki ± 0% -56.01% (p=0.000 n=20)
ReadAll/size=397601-16 1934.3Ki ± 0% 919.8Ki ± 0% -52.45% (p=0.000 n=20)
ReadAll/size=626039-16 3.092Mi ± 0% 1.359Mi ± 0% -56.04% (p=0.000 n=20)
ReadAll/size=1000000-16 4.998Mi ± 0% 2.086Mi ± 0% -58.27% (p=0.000 n=20)
ReadAll/size=1141838-16 6.334Mi ± 0% 2.219Mi ± 0% -64.98% (p=0.000 n=20)
ReadAll/size=2414329-16 12.725Mi ± 0% 4.789Mi ± 0% -62.37% (p=0.000 n=20)
ReadAll/size=5136407-16 25.28Mi ± 0% 10.44Mi ± 0% -58.71% (p=0.000 n=20)
ReadAll/size=10000000-16 49.84Mi ± 0% 21.92Mi ± 0% -56.02% (p=0.000 n=20)
ReadAll/size=18285584-16 97.88Mi ± 0% 35.99Mi ± 0% -63.23% (p=0.000 n=20)
ReadAll/size=67379426-16 375.2Mi ± 0% 158.0Mi ± 0% -57.91% (p=0.000 n=20)
ReadAll/size=100000000-16 586.7Mi ± 0% 235.9Mi ± 0% -59.80% (p=0.000 n=20)
geomean 645.4Ki 324.6Ki -49.70%
│ old │ io.ReadAll (new) │
│ final-cap │ final-cap vs base │
ReadAll/size=300-16 512.0 ± 0% 512.0 ± 0% ~ (p=1.000 n=20) ¹
ReadAll/size=512-16 896.0 ± 0% 512.0 ± 0% -42.86% (p=0.000 n=20)
ReadAll/size=1000-16 1.408k ± 0% 1.024k ± 0% -27.27% (p=0.000 n=20)
ReadAll/size=4096-16 5.376k ± 0% 4.096k ± 0% -23.81% (p=0.000 n=20)
ReadAll/size=6648-16 6.912k ± 0% 6.784k ± 0% -1.85% (p=0.000 n=20)
ReadAll/size=10000-16 12.29k ± 0% 10.24k ± 0% -16.67% (p=0.000 n=20)
ReadAll/size=12179-16 12.29k ± 0% 12.29k ± 0% ~ (p=1.000 n=20) ¹
ReadAll/size=16384-16 21.76k ± 0% 16.38k ± 0% -24.71% (p=0.000 n=20)
ReadAll/size=32768-16 40.96k ± 0% 32.77k ± 0% -20.00% (p=0.000 n=20)
ReadAll/size=65536-16 73.73k ± 0% 65.54k ± 0% -11.11% (p=0.000 n=20)
ReadAll/size=80000-16 98.30k ± 0% 81.92k ± 0% -16.67% (p=0.000 n=20)
ReadAll/size=100000-16 131.1k ± 0% 106.5k ± 0% -18.75% (p=0.000 n=20)
ReadAll/size=118014-16 131.1k ± 0% 122.9k ± 0% -6.25% (p=0.000 n=20)
ReadAll/size=131072-16 172.0k ± 0% 131.1k ± 0% -23.81% (p=0.000 n=20)
ReadAll/size=397601-16 442.4k ± 0% 401.4k ± 0% -9.26% (p=0.000 n=20)
ReadAll/size=626039-16 704.5k ± 0% 630.8k ± 0% -10.47% (p=0.000 n=20)
ReadAll/size=1000000-16 1.114M ± 0% 1.008M ± 0% -9.56% (p=0.000 n=20)
ReadAll/size=1141838-16 1.401M ± 0% 1.147M ± 0% -18.13% (p=0.000 n=20)
ReadAll/size=2414329-16 2.753M ± 0% 2.417M ± 0% -12.20% (p=0.000 n=20)
ReadAll/size=5136407-16 5.399M ± 0% 5.145M ± 0% -4.70% (p=0.000 n=20)
ReadAll/size=10000000-16 10.56M ± 0% 10.00M ± 0% -5.28% (p=0.000 n=20)
ReadAll/size=18285584-16 20.65M ± 0% 18.29M ± 0% -11.42% (p=0.000 n=20)
ReadAll/size=67379426-16 78.84M ± 0% 67.39M ± 0% -14.53% (p=0.000 n=20)
ReadAll/size=100000000-16 123.2M ± 0% 100.0M ± 0% -18.82% (p=0.000 n=20)
geomean 178.3k 151.3k -15.10%
│ old │ io.ReadAll (new) │
│ excess-ratio │ excess-ratio vs base │
ReadAll/size=300-16 1.707 ± 0% 1.707 ± 0% ~ (p=1.000 n=20) ¹
ReadAll/size=512-16 1.750 ± 0% 1.000 ± 0% -42.86% (p=0.000 n=20)
ReadAll/size=1000-16 1.408 ± 0% 1.024 ± 0% -27.27% (p=0.000 n=20)
ReadAll/size=4096-16 1.312 ± 0% 1.000 ± 0% -23.78% (p=0.000 n=20)
ReadAll/size=6648-16 1.040 ± 0% 1.020 ± 0% -1.92% (p=0.000 n=20)
ReadAll/size=10000-16 1.229 ± 0% 1.024 ± 0% -16.68% (p=0.000 n=20)
ReadAll/size=12179-16 1.009 ± 0% 1.009 ± 0% ~ (p=1.000 n=20) ¹
ReadAll/size=16384-16 1.328 ± 0% 1.000 ± 0% -24.70% (p=0.000 n=20)
ReadAll/size=32768-16 1.250 ± 0% 1.000 ± 0% -20.00% (p=0.000 n=20)
ReadAll/size=65536-16 1.125 ± 0% 1.000 ± 0% -11.11% (p=0.000 n=20)
ReadAll/size=80000-16 1.229 ± 0% 1.024 ± 0% -16.68% (p=0.000 n=20)
ReadAll/size=100000-16 1.311 ± 0% 1.065 ± 0% -18.76% (p=0.000 n=20)
ReadAll/size=118014-16 1.111 ± 0% 1.041 ± 0% -6.30% (p=0.000 n=20)
ReadAll/size=131072-16 1.312 ± 0% 1.000 ± 0% -23.78% (p=0.000 n=20)
ReadAll/size=397601-16 1.113 ± 0% 1.010 ± 0% -9.25% (p=0.000 n=20)
ReadAll/size=626039-16 1.125 ± 0% 1.008 ± 0% -10.40% (p=0.000 n=20)
ReadAll/size=1000000-16 1.114 ± 0% 1.008 ± 0% -9.52% (p=0.000 n=20)
ReadAll/size=1141838-16 1.227 ± 0% 1.004 ± 0% -18.17% (p=0.000 n=20)
ReadAll/size=2414329-16 1.140 ± 0% 1.001 ± 0% -12.19% (p=0.000 n=20)
ReadAll/size=5136407-16 1.051 ± 0% 1.002 ± 0% -4.66% (p=0.000 n=20)
ReadAll/size=10000000-16 1.056 ± 0% 1.000 ± 0% -5.30% (p=0.000 n=20)
ReadAll/size=18285584-16 1.129 ± 0% 1.000 ± 0% -11.43% (p=0.000 n=20)
ReadAll/size=67379426-16 1.170 ± 0% 1.000 ± 0% -14.53% (p=0.000 n=20)
ReadAll/size=100000000-16 1.232 ± 0% 1.000 ± 0% -18.83% (p=0.000 n=20)
geomean 1.216 1.033 -15.10%
│ io.ReadAll │ io.ReadAll (new) │
│ allocs/op │ allocs/op vs base │
ReadAll/size=300-16 1.000 ± 0% 1.000 ± 0% ~ (p=1.000 n=20) ¹
ReadAll/size=512-16 2.000 ± 0% 3.000 ± 0% +50.00% (p=0.000 n=20)
ReadAll/size=1000-16 3.000 ± 0% 4.000 ± 0% +33.33% (p=0.000 n=20)
ReadAll/size=4096-16 7.000 ± 0% 9.000 ± 0% +28.57% (p=0.000 n=20)
ReadAll/size=6648-16 8.000 ± 0% 10.000 ± 0% +25.00% (p=0.000 n=20)
ReadAll/size=10000-16 10.00 ± 0% 11.00 ± 0% +10.00% (p=0.000 n=20)
ReadAll/size=12179-16 10.00 ± 0% 11.00 ± 0% +10.00% (p=0.000 n=20)
ReadAll/size=16384-16 12.00 ± 0% 13.00 ± 0% +8.33% (p=0.000 n=20)
ReadAll/size=32768-16 14.00 ± 0% 15.00 ± 0% +7.14% (p=0.000 n=20)
ReadAll/size=65536-16 16.00 ± 0% 16.00 ± 0% ~ (p=1.000 n=20) ¹
ReadAll/size=80000-16 17.00 ± 0% 17.00 ± 0% ~ (p=1.000 n=20) ¹
ReadAll/size=100000-16 18.00 ± 0% 17.00 ± 0% -5.56% (p=0.000 n=20)
ReadAll/size=118014-16 18.00 ± 0% 18.00 ± 0% ~ (p=1.000 n=20) ¹
ReadAll/size=131072-16 19.00 ± 0% 18.00 ± 0% -5.26% (p=0.000 n=20)
ReadAll/size=397601-16 23.00 ± 0% 22.00 ± 0% -4.35% (p=0.000 n=20)
ReadAll/size=626039-16 25.00 ± 0% 23.00 ± 0% -8.00% (p=0.000 n=20)
ReadAll/size=1000000-16 27.00 ± 0% 24.00 ± 0% -11.11% (p=0.000 n=20)
ReadAll/size=1141838-16 28.00 ± 0% 24.00 ± 0% -14.29% (p=0.000 n=20)
ReadAll/size=2414329-16 31.00 ± 0% 26.00 ± 0% -16.13% (p=0.000 n=20)
ReadAll/size=5136407-16 34.00 ± 0% 28.00 ± 0% -17.65% (p=0.000 n=20)
ReadAll/size=10000000-16 37.00 ± 0% 30.00 ± 0% -18.92% (p=0.000 n=20)
ReadAll/size=18285584-16 40.00 ± 0% 31.00 ± 0% -22.50% (p=0.000 n=20)
ReadAll/size=67379426-16 46.00 ± 0% 35.00 ± 0% -23.91% (p=0.000 n=20)
ReadAll/size=100000000-16 48.00 ± 0% 36.00 ± 0% -25.00% (p=0.000 n=20)
geomean 14.89 14.65 -1.65%
Finally, the read size in this CL currently grows exponentially
at a 1.5x growth rate. The old approach had its read size grow at
a ~1.25x growth rate once the reads are larger. We could consider
for example using a 1.25x read size growth rate here as well.
There are perhaps some ~mild trade-offs. One benefit might
be something like a ~5% smaller peak live heap contribution
(at the end, when copying into the final result) if we used
a 1.25x read growth instead of 1.5x read growth.
That said, for some systems, larger read sizes can trigger
higher throughput behavior further down the stack or
elsewhere in a system, such as via better read-ahead behavior,
larger transfer sizes, etc.
I've observed this effect in various real-world systems,
including distributed systems as well as for example with
spinning platters (which are still widely used, including
backing various "Internet scale" systems). When the effect
exists, it is usually substantial.
Therefore, my guess is it is better to get to larger read
sizes faster, which is one reason the CL is using 1.5x
read size growth rate instead of 1.25x. Also, for something
like peak live heap contribution, we are already getting
substantial wins in this CL for total heap bytes allocated,
so maybe that is OK.
(I have the actual benchmark in a separate commit, which I can
send later, or I can update this CL if preferred).
Fixes #50774
Updates #74299
Change-Id: I65eabf1d83a00fbdbe42e4c697116955f8251740
Reviewed-on: https://go-review.googlesource.com/c/go/+/722500
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Michael Pratt <mpratt@google.com>
Reviewed-by: Keith Randall <khr@google.com>
Reviewed-by: Keith Randall <khr@golang.org>
Reviewed-by: Dmitri Shuralyov <dmitshur@google.com>1 parent cec4d43 commit 481c6df
1 file changed
+28
-4
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
707 | 707 | | |
708 | 708 | | |
709 | 709 | | |
| 710 | + | |
| 711 | + | |
710 | 712 | | |
| 713 | + | |
| 714 | + | |
| 715 | + | |
| 716 | + | |
| 717 | + | |
| 718 | + | |
| 719 | + | |
| 720 | + | |
711 | 721 | | |
712 | 722 | | |
713 | 723 | | |
714 | 724 | | |
715 | 725 | | |
716 | 726 | | |
717 | 727 | | |
718 | | - | |
| 728 | + | |
| 729 | + | |
| 730 | + | |
| 731 | + | |
| 732 | + | |
| 733 | + | |
| 734 | + | |
| 735 | + | |
| 736 | + | |
| 737 | + | |
| 738 | + | |
| 739 | + | |
719 | 740 | | |
720 | 741 | | |
721 | | - | |
722 | | - | |
723 | | - | |
| 742 | + | |
| 743 | + | |
| 744 | + | |
| 745 | + | |
| 746 | + | |
| 747 | + | |
724 | 748 | | |
725 | 749 | | |
726 | 750 | | |
0 commit comments