Cache schema / channel IDs in Python Writer #1467

ThatGeoGuy · 2025-10-09T23:11:22Z

Changelog

Add caching to register_schema and register_channel methods on the Python Writer class.

Docs

I don't know that there is an issue for this. Probably there should be one made since this is a behavioural change.

Description

The Rust crate for mcap has this handy feature where channel and schema IDs will be cached via a map when "registered" or added to a writer. What this means in practice is that if you add the same schema contents (or conversely channel contents) multiple times to a writer, then you will always get the same schema ID / channel ID back (and no record will be written, since the original record was already written to the MCAP).

This is useful, as you seldom want to write the same channel information across multiple IDs. If you don't have this feature, you have to cache this data yourself. Unfortunately, while Rust does the "right" thing (right being defined as according to "what did I expect to happen"), the Python library does not. It forces users to cache the IDs in their own application.

For the purpose of this change, I more or less just cached based off of everything passed to either the register_schema or register_channel functions. The intent was to be analogous to the behaviour in the Rust crate, but in a Pythonic way.

Note that I couldn't just use functools.cache because these were methods which accepted self. As such I needed to make a small wrapper. The wrapper itself is something you can probably find across dozens of stackoverflow answers, e.g.:

https://stackoverflow.com/a/68052994

That being said, I think the change is relatively straightforward and so it would be hard to modify this or rewrite it from scratch in any meaningful way.

Ultimately, this is a behavioural breaking change. Users who are already tracking IDs for schemas and channels will probably notice that they are doing so needlessly if this is pushed. Users that aren't will notice that they are no longer writing the same channel / schema records multiple times. I mostly figured that the other mcap libraries in other languages default to this behaviour, so it was a bit odd that it wasn't present here; however, I could see how making a breaking change like this could be annoying, since you can't choose not to cache with this commit applied.

I would say that if that is the case though, then you are probably doing something a bit outside of the norm. 🤔 I don't know if I can try to argue for such a scenario, because it seems strange to want more channel records that would otherwise cause issues.

NOTE: I noticed that cached IDs was a problem after I did not cache them myself in a python script, and resulted in running over the limit for total number of schemas / channel IDs (which is u16::MAX). 😬 So this change also helps you avoid exhausting your schema / channel IDs by doing something stupid.

I added the unit test test_schema_channel_caching which utilizes generate_mcap_schemas_channels (also new, although a bit copy-and-pasted) to validate this. Seems to work as expected on my end.

Before	After
I don't know how to display this but prior to this change you would have to manage schema / channel IDs externally lest you write the schema and channel to the MCAP file multiple times.

ThatGeoGuy · 2025-10-09T23:17:51Z

Some considerations: I realize that I wrote this after finding a stackoverflow post, but I should probably update it to use functools.cached_property.

However, this requires a minimum version of Python 3.8 (as opposed to what I did, which replicates this needlessly for Python 3.9+). 🤔

What is the earliest version of Python supported by the mcap library? There is likely a way to rewrite this with the lru_cache equivalent (which was added in 3.2).

ThatGeoGuy · 2025-10-09T23:20:38Z

Actually, negate that previous comment. I am confusing myself. cached_property won't cache according to the arguments passed in, it computes the value exactly once (because it is a property).

Anyways, I can modify the function to use lru_cache with a max size of None.

Cache schema / channel IDs in Python Writer

bb653eb

ThatGeoGuy requested review from gasmith, james-rms and jtbandes as code owners October 9, 2025 23:11

ThatGeoGuy force-pushed the main branch from 4ea2ea4 to bb653eb Compare October 9, 2025 23:19

fixup! Cache schema / channel IDs in Python Writer

50041cb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Cache schema / channel IDs in Python Writer #1467

Cache schema / channel IDs in Python Writer #1467

Uh oh!

ThatGeoGuy commented Oct 9, 2025

ThatGeoGuy commented Oct 9, 2025

ThatGeoGuy commented Oct 9, 2025

Labels

1 participant

Cache schema / channel IDs in Python Writer #1467

Are you sure you want to change the base?

Cache schema / channel IDs in Python Writer #1467

Uh oh!

Conversation

ThatGeoGuy commented Oct 9, 2025

Changelog

Docs

Description

ThatGeoGuy commented Oct 9, 2025

ThatGeoGuy commented Oct 9, 2025

Labels

1 participant