Skip to content

Conversation

@chenat9
Copy link

@chenat9 chenat9 commented Jan 21, 2026

Summary

This PR introduces support for ByteDance Volcengine TOS object storage as a remote backend for the MCAP CLI, specifically enabling tos:// URI reads.

Motivation

At ByteDance, we extensively use MCAP files stored on TOS for our robotics and VLA (Vision-Language-Action) models. Currently, our internal workflows and tools often require downloading full MCAP files locally before processing. This approach incurs significant bandwidth costs and latency.

To optimize this and avoid maintaining a hard fork of the CLI, we aim to upstream our internal integration to the community. This allows users to inspect and process MCAP files directly from TOS without intermediate downloading.

Implementation Details

  • Added a tos:// scheme handler to the CLI's remote file reader.
  • The implementation limits dependencies and strictly follows the existing patterns established by the S3 and GCS integrations to ensure consistency.

Future Roadmap

This PR serves as the initial minimal contribution to establish the workflow. We have developed several advanced optimizations internally for AI training scenarios, which we plan to contribute in subsequent PRs:

  • High-Performance Access: Integration with TOS accelerator (SSD layer) to provide higher IOPS for training clusters.
  • Smart Caching: Optimizing range requests to accelerate random reads (critical for AI data shuffling).
  • Write Support: Implementing append-only write support for handling file merging directly on object storage.
  • Network Optimization: High-speed network access (similar to AWS CRT) .
  • HDFS Support: Remote read implementation for HDFS protocols.

Verification

  • Verified locally by building the CLI and running mcap info tos://bucket/path/to/file.mcap.
  • Ensured go.mod dependencies are minimal.

First time contributor here, happy to join the community! Looking forward to your review.

Signed-off-by: chencunge <chencunge@bytedance.com>
chencunge and others added 2 commits January 21, 2026 19:05
Signed-off-by: chencunge <chencunge@bytedance.com>
Signed-off-by: chencunge <chencunge@bytedance.com>
Copy link

@cursor cursor bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 1 potential issue.

Bugbot Autofix is OFF. To automatically fix reported issues with Cloud Agents, enable Autofix in the Cursor dashboard.

)
}

opts := []tos.ClientOption{tos.WithRegion(region)}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Empty region option unconditionally passed to SDK

Medium Severity

The error message states "TOS endpoint or region must be configured", indicating that providing only an endpoint should be valid. However, tos.WithRegion(region) is unconditionally added to the options slice, even when region is an empty string. When only endpoint is configured (and region is empty), this passes WithRegion("") to the SDK, which may cause the client to fail with an invalid region error or exhibit unexpected behavior. The region option should only be added when region != "".

Additional Locations (1)

Fix in Cursor Fix in Web

@jneless
Copy link

jneless commented Jan 22, 2026

Great Cunge, Thanks for contributing.

I would like to provide more context for mcap community, regarding tos & mcap senarios.
see: https://docs.google.com/document/d/1DFqkflCGO6gqZKQM1DprMzQv30HeuoNRclvG9orWaYs/edit?usp=sharing

maybe the community is hard to review and test in person.

Any review need more test, do not hesitate to email jialin.li@bytedance.com

I would be very happy to provide a pair of AK/SK and demo Endpoint files.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

2 participants