Skip to content

Conversation

@OussamaSaoudi
Copy link
Collaborator

@OussamaSaoudi OussamaSaoudi commented Oct 21, 2025

Which Delta project/connector is this regarding?

  • Spark
  • Standalone
  • Flink
  • Kernel
  • Other (fill in here)

Description

This PR adds support for Coordinated Commits v2 (CCv2) tables to the Delta Kernel benchmarking framework, enabling performance testing of tables using staged commits with Unity Catalog coordination.
Key changes:

  • Add flag to table_info.json to indicate whether it is a CCv2 table or not.
  • New CCv2Context class that encapsulates CCv2 infrastructure (in-memory UC coordinator, committer, staged commit management)
  • New CCv2Info class that represents the mapping from commit version to staged commit path
  • Extended WorkloadRunner, ReadMetadataRunner, and WriteRunner to support reading from and writing to CCv2 tables
  • Fixed bug in WorkloadOutputFormat where individual iteration results were used instead of aggregated results, causing duplicate/incorrect metrics
  • Extend cleanup logic to clean the staged commit directory.

How was this patch tested?

  • Added basic_ccv2 test table with complete CCv2 structure (backfilled and staged commits). Added read and write workload specs for basic_ccv2 table.

Does this PR introduce any user-facing changes?

No.

@OussamaSaoudi OussamaSaoudi force-pushed the ccv2_tables branch 3 times, most recently from a419409 to 824aa16 Compare October 29, 2025 22:34
@OussamaSaoudi OussamaSaoudi marked this pull request as ready for review October 29, 2025 22:42
@OussamaSaoudi OussamaSaoudi changed the title [WIP] Ccv2 benchmarks Oct 29, 2025
Copy link
Collaborator

@allisonport-db allisonport-db left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a few comments looks great!

Option.apply(stagedCommit.getVersion()), // commitVersion
Option.apply(fileStatus.getSize()), // commitFileSize
Option.apply(fileStatus.getModificationTime()), // commitFileModTime
Option.apply(System.currentTimeMillis()), // commitTimestamp
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not familiar, is this supposed to match the ICT or just when committed to catalog?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm good point. For correctness this should probably be ICT, but that may become expensive to load each commit file to get the ICT 🤔

I'll put a note.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could just add the timestamp to ccv2_info.json?

Comment on lines 184 to 187
String stagedCommitsDir = Paths.get(tableRoot, "_delta_log", "_staged_commits").toString();

String commitUuid = UUID.randomUUID().toString();
String stagedCommitFileName = String.format("%020d.%s.json", version, commitUuid);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we use the methods in FileNames for this?

public CCv2Info() {}

/**
* Constructor for creating CCv2Info.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we have any requirements on the ordering and such of logTail? Might be worth calling out here

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

StagedCommit contains its version, so it's fine for this to be unordered.r I'll put in a comment to clarify that.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

2 participants