Skip to content

Avoid SmallSetBase heap allocation in default constructor #4884

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 3 commits into
base: main
Choose a base branch
from

Conversation

Squareys
Copy link

@Squareys Squareys commented Aug 6, 2022

Hi @kripken & @sbc100 !

As described in #4165 (comment), this is the change that wraps the std::set in the SmallSetBase into a std::optional to avoid the heap allocation that creates the red-black-tree root on MSVC STL.

I made one drive-by change in a separate commit that avoids a copy per element in the for-each of the std::initializer_list constructor of SmallSetBase.

Best,
Jonathan from Wonderland Engine.

@Squareys Squareys force-pushed the small-set-heap-allocation branch 3 times, most recently from 0afc9bd to abf24ad Compare August 6, 2022 20:55
Copy link
Member

@kripken kripken left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR!

This sounds good to me. Please let's split out the effects.h part though, as suggested in the comments, and then let's benchmark this - I'd like to verify it isn't a significant regression on non-MSVC (but I'm pretty sure it's not).

@Squareys
Copy link
Author

Squareys commented Aug 8, 2022

Thanks for the review! I'll have time to apply the requested changes and split the changes from effects.h tomorrow evening CEST.

@Squareys Squareys force-pushed the small-set-heap-allocation branch from abf24ad to 6c98021 Compare August 9, 2022 14:16
@Squareys
Copy link
Author

Squareys commented Aug 9, 2022

@kripken I applied the feedback and split off the changes to effects.h 👍 Opening a PR for that in a few moments, benchmarking a few alternatives to 10

@Squareys
Copy link
Author

Squareys commented Aug 9, 2022

@kripken Please consider the benchmark and analysis here, on the other PR before merging this.

While it's a net-positive, it does pessimize the code without the EffectsAnalyzer change, which seems to be a special case that benefits massively from avoiding the allocation, possibly allocated especially often.

Copy link
Member

@kripken kripken left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm % comment

// flexible additional storage.
// Because std::set allocates the root node of its red-black tree on the
// heap, we wrap it std::optional to defer that allocation until we start
// using the flexible storage.
Copy link
Member

@kripken kripken Aug 9, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This comment is a little specific to the MSVC STL I think. How about this?

Flexible additional storage. We use std::optional here to reduce the overhead in the
common case, where the flexible storage is never needed. With std::optional we just
have a pointer taking up space, as opposed to a full std::set which is larger. Also, on
MSVC's STL at least the std::set constructor does some work, which we want to avoid.
@Squareys Squareys force-pushed the small-set-heap-allocation branch from 6c98021 to 2a05446 Compare September 28, 2022 13:18
@Squareys
Copy link
Author

@kripken Changed the comment and rebased. Sorry for the crazy long delay!

I'll be rebasing #4885 next, which is waiting for this to land.

@kripken
Copy link
Member

kripken commented Sep 28, 2022

Looks like there are some errors on CI.

@Squareys Squareys force-pushed the small-set-heap-allocation branch from 2a05446 to ff55997 Compare November 23, 2022 11:37
@Squareys
Copy link
Author

Rebased onto latest main and fixed the failing test case.

Sorry for the crazy slow iteration cycle times on this, this project was done during a once-per-month "Workflow Improvement Day".

@Squareys
Copy link
Author

Note, while rebasing #4885, I noticed, that SmallSet::erase() was returning void instead of size_t like std::set does. I added a commit to update that, such that the code which depends on it in effects analyzer compiles.

@Squareys Squareys force-pushed the small-set-heap-allocation branch 2 times, most recently from c7c7201 to 7725c70 Compare November 23, 2022 12:51
@kripken
Copy link
Member

kripken commented Nov 23, 2022

Code lgtm, also ran a quick local benchmark and looks good. Once tests pass this will merge.

@kripken kripken enabled auto-merge (squash) November 23, 2022 18:11
@kripken
Copy link
Member

kripken commented Nov 23, 2022

Looks like there is a test error though.

@Squareys
Copy link
Author

Looks like there is a test error though.

Indeed, will find time to have another look before end of the year.

auto-merge was automatically disabled March 28, 2023 15:35

Head branch was pushed to by a user without write access

@Squareys Squareys force-pushed the small-set-heap-allocation branch from 7725c70 to b150937 Compare March 28, 2023 15:35
@Squareys
Copy link
Author

end of the year.

Oh well... sorry this is taking so long.

It's not a test failure, but likely a clang or GCC compiler difference, so I'd appreciate if you could approve the workflow as the previous logs expired.
I just rebased to trigger the workflow run.

@kripken
Copy link
Member

kripken commented Mar 29, 2023

I re-ran the workflow now. (It seems like it needs to be approved each time for first-time contributors, unfortunately.)

Signed-off-by: Squareys <squareys@googlemail.com>
Signed-off-by: Squareys <squareys@googlemail.com>
…t does

Signed-off-by: Squareys <squareys@googlemail.com>
@Squareys Squareys force-pushed the small-set-heap-allocation branch from b150937 to fe060b8 Compare March 29, 2023 20:55
Copy link
Author

@Squareys Squareys left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kripken I fixed the test and commented on the two changes I made in the process.

// Otherwise we use the fixed storage. Note that if we grow and shrink then
// we will stay in flexible mode until we reach a size of zero, at which
// point we return to fixed mode. This is intentional, to avoid a lot of
// movement in switching between fixed and flexible mode.
return flexible.empty();
return !flexible || flexible->empty();
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The test failure was caused by this condition being faulty.

@@ -165,23 +172,23 @@ class SmallSetBase {
// We need to add an item but no fixed storage remains to grow. Switch
// to flexible.
assert(fixed.used == N);
assert(flexible.empty());
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I ended up removing this assert, because it would have been updated to !flexible || flexible->empty(), i.e. equivalent to usingFixed(), which is tested in the condition above. Happy to put it back in, if you believe it's still useful.

FlexibleSet flexible;
// Flexible additional storage. We use std::optional here to reduce the
// overhead in the common case, where the flexible storage is never needed.
// With std::optional we just have a pointer taking up space, as opposed to a
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rereading this now, this doesn't seem true at least on linux, as sizeof tells me that the full storage is used. That is, the gnu C++ STL seems to implement an optional as enough space for the item, and a tag whether it is present or not.

Given that, I think we might have been wrong in our previous discussion here. That is, using std::optional here increases memory size, so it may have downsides, and we should probably be more careful.

If our goal is to avoid a problem with the MSVC STL, then a workaround specific to there might make more sense actually.

What I'd suggest immediately though is to perhaps go look at what LLVM's SmallSet does, as a lot of thought has gone into that. We should probably do what they are doing.

Copy link
Author

@Squareys Squareys Apr 3, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unfortunately, LLVM's SmallSet doesn't seem to address the heap allocation on MSVC.

I believe the measurements you had made were on Linux? It seemed benefits from using avoiding the std::set construction outweighed the size increase there, too? (At least when used in EffectsAnalyzer, and way less significant than on windows... I suppose it's not making heap allocations in the STL implementation there).

Effectively, we could, of course, avoid the entire PR, if we'd use another STL, as you had suggested in this comment.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
2 participants