[kernel-spark] Implement latestOffset() with rate limiting for dsv2 streaming #5409

zikangh · 2025-10-28T22:06:29Z

🥞 Stacked PR

Use this link to review incremental changes.

stack/test1 [Files changed]
- stack/latestsnapshot2 [Files changed]

Which Delta project/connector is this regarding?

Description

We add implementation for latestOffset(startOffset, limit) and getDefaultReadLimit() for a complete SupportsAdmissionControl implementation.
Also refactored a few DeltaSource.scala methods -- we make them static so we can call them from SparkMicrobatchStream.java.

How was this patch tested?

Parameterized tests verifying parity between DSv1 (DeltaSource) and DSv2 (SparkMicroBatchStream).

Does this PR introduce any user-facing changes?

No

zikangh · 2025-10-29T01:09:18Z

Hi @huan233usc, could you please add @huan233usc, @gengliangwang, @jerrypeng, @tdas to the reviewers list?

huan233usc · 2025-10-29T03:33:53Z

kernel-spark/src/main/java/io/delta/kernel/spark/read/SparkMicroBatchStream.java

+    this.streamingHelper = new StreamingHelper(tablePath, hadoopConf);
+
+    // Initialize snapshot at source init to get table ID, similar to DeltaSource.scala
+    this.snapshotAtSourceInit = TableManager.loadSnapshot(tablePath).build(engine);


Just call streamingHelper.loadLatestSnapshot(), underlying it will call TableManager.loadSnapshot(tablePath).build(engine)

huan233usc · 2025-10-29T03:43:02Z

kernel-spark/src/main/java/io/delta/kernel/spark/read/SparkMicroBatchStream.java

+  @Override
+  public Offset latestOffset(Offset startOffset, ReadLimit limit) {
+    // For the first batch, initialOffset() should be called before latestOffset().
+    // if startOffset is null: no data is available to read.


Will there by a case that startOffset is set to null

Spark will call initialOffset first to obtain this startOffset for batch 0 -- so it could be null when initialOffset() returns null (i.e. table has no data).

huan233usc · 2025-10-29T03:44:45Z

kernel-spark/src/main/java/io/delta/kernel/spark/read/SparkMicroBatchStream.java

+    // TODO(#5318): Check read-incompatible schema changes during stream start
+    IndexedFile lastFile = lastFileChange.get();
+    return ScalaUtils.toJavaOptional(
+        DeltaSource.buildOffsetFromIndexedFile(


DeltaSource.buildOffsetFromIndexedFile seems to always return an offset although it returns Option[Offset].

Can we update the signature for DeltaSource.buildOffsetFromIndexedFile? I just want to minimize the interaction with null if it is not really necessary.

zikangh mentioned this pull request Oct 28, 2025

[kernel-spark] Add rate limiting to getFileChanges() for DSv2 streaming #5361

Open

5 tasks

zikangh changed the title ~~latest snapshot~~ Oct 28, 2025

zikangh force-pushed the stack/latestsnapshot2 branch from f602eff to 09ebcbb Compare October 29, 2025 01:00

zikangh changed the title ~~[WIP][kernel-spark] Implement latestOffset() with rate limiting for dsv2 streaming~~ Oct 29, 2025

zikangh marked this pull request as ready for review October 29, 2025 01:08

huan233usc reviewed Oct 29, 2025

View reviewed changes

zikangh force-pushed the stack/latestsnapshot2 branch 2 times, most recently from 797770c to 8298d06 Compare October 30, 2025 00:43

huan233usc requested review from gengliangwang and tdas October 30, 2025 00:44

zikangh force-pushed the stack/latestsnapshot2 branch from 8298d06 to df5e0f1 Compare October 31, 2025 00:09

zikangh requested a review from huan233usc October 31, 2025 21:08

zikangh force-pushed the stack/latestsnapshot2 branch 2 times, most recently from acc4fe8 to 8d106d8 Compare October 31, 2025 22:28

zikangh added 13 commits October 31, 2025 22:42

minor change

c76f371

rate limit

a9f9e6e

add test

b400708

fix compilation

9e7455f

final touchup

a4eceaa

nits

301d37f

address comments

477e6d6

remove createAdmissionControl

2a07d62

address comments

ee10d41

nit

d55b341

comment

458e3fa

revert refactor

917fc78

format

9167cc4

zikangh added 6 commits October 31, 2025 22:42

nit

a0fa2db

latest snapshot

cf424d3

sync

0e2a200

address comments

db1580d

revert refactor

97c0c4a

refactor

029a596

zikangh force-pushed the stack/latestsnapshot2 branch from 8d106d8 to 029a596 Compare October 31, 2025 23:19

small refactor

cb4bee3

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[kernel-spark] Implement latestOffset() with rate limiting for dsv2 streaming #5409

[kernel-spark] Implement latestOffset() with rate limiting for dsv2 streaming #5409

Uh oh!

zikangh commented Oct 28, 2025 •

edited

Loading

zikangh commented Oct 29, 2025

huan233usc Oct 29, 2025

zikangh Oct 31, 2025

huan233usc Oct 29, 2025

zikangh Oct 31, 2025

huan233usc Oct 29, 2025

zikangh Oct 31, 2025

Labels

2 participants

[kernel-spark] Implement latestOffset() with rate limiting for dsv2 streaming #5409

Are you sure you want to change the base?

[kernel-spark] Implement latestOffset() with rate limiting for dsv2 streaming #5409

Uh oh!

Conversation

zikangh commented Oct 28, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🥞 Stacked PR

Which Delta project/connector is this regarding?

Description

How was this patch tested?

Does this PR introduce any user-facing changes?

zikangh commented Oct 29, 2025

huan233usc Oct 29, 2025

Choose a reason for hiding this comment

zikangh Oct 31, 2025

Choose a reason for hiding this comment

huan233usc Oct 29, 2025

Choose a reason for hiding this comment

zikangh Oct 31, 2025

Choose a reason for hiding this comment

huan233usc Oct 29, 2025

Choose a reason for hiding this comment

zikangh Oct 31, 2025

Choose a reason for hiding this comment

Labels

2 participants

zikangh commented Oct 28, 2025 •

edited

Loading