Cache schema / channel IDs in Python Writer #1467
Open
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Changelog
Add caching to
register_schemaandregister_channelmethods on the Python Writer class.Docs
I don't know that there is an issue for this. Probably there should be one made since this is a behavioural change.
Description
The Rust crate for
mcaphas this handy feature where channel and schema IDs will be cached via a map when "registered" or added to a writer. What this means in practice is that if you add the same schema contents (or conversely channel contents) multiple times to a writer, then you will always get the same schema ID / channel ID back (and no record will be written, since the original record was already written to the MCAP).This is useful, as you seldom want to write the same channel information across multiple IDs. If you don't have this feature, you have to cache this data yourself. Unfortunately, while Rust does the "right" thing (right being defined as according to "what did I expect to happen"), the Python library does not. It forces users to cache the IDs in their own application.
For the purpose of this change, I more or less just cached based off of everything passed to either the
register_schemaorregister_channelfunctions. The intent was to be analogous to the behaviour in the Rust crate, but in a Pythonic way.Note that I couldn't just use
functools.cachebecause these were methods which acceptedself. As such I needed to make a small wrapper. The wrapper itself is something you can probably find across dozens of stackoverflow answers, e.g.:https://stackoverflow.com/a/68052994
That being said, I think the change is relatively straightforward and so it would be hard to modify this or rewrite it from scratch in any meaningful way.
Ultimately, this is a behavioural breaking change. Users who are already tracking IDs for schemas and channels will probably notice that they are doing so needlessly if this is pushed. Users that aren't will notice that they are no longer writing the same channel / schema records multiple times. I mostly figured that the other mcap libraries in other languages default to this behaviour, so it was a bit odd that it wasn't present here; however, I could see how making a breaking change like this could be annoying, since you can't choose not to cache with this commit applied.
I would say that if that is the case though, then you are probably doing something a bit outside of the norm. 🤔 I don't know if I can try to argue for such a scenario, because it seems strange to want more channel records that would otherwise cause issues.
NOTE: I noticed that cached IDs was a problem after I did not cache them myself in a python script, and resulted in running over the limit for total number of schemas / channel IDs (which is
u16::MAX). 😬 So this change also helps you avoid exhausting your schema / channel IDs by doing something stupid.I added the unit test
test_schema_channel_cachingwhich utilizesgenerate_mcap_schemas_channels(also new, although a bit copy-and-pasted) to validate this. Seems to work as expected on my end.I don't know how to display this but prior to this change you would have to manage schema / channel IDs externally lest you write the schema and channel to the MCAP file multiple times.