[GLUTEN-9163][VL] Separate compression buffer and disk write buffer configuration by marin-ma · Pull Request #9356 · apache/gluten

marin-ma · 2025-04-17T16:40:51Z

A follow-up to #9278

spark.shuffle.spill.diskWriteBufferSize is used for setting the buffer size before spill to store the sorted rows. Spiller will write the data in this buffer to the output stream.

spark.io.compression.lz4.blockSize,spark.io.compression.zstd.bufferSize are used to set the compression buffer size in the compressed output stream, depending on which compression codec is set.

The memory allocation of two buffers in spark are counted into overhead memory, so we use arrow::default_memory_pool to allocate them.

Add spark.gluten.sql.columnar.shuffle.sort.deserializerBufferSize: Buffer size in bytes for sort-based shuffle reader deserializing raw input to columnar batch.

github-actions · 2025-04-17T16:41:11Z

#9163

github-actions · 2025-04-17T16:41:26Z

Run Gluten Clickhouse CI on x86

github-actions · 2025-04-17T16:42:21Z

Run Gluten Clickhouse CI on x86

github-actions · 2025-04-18T09:52:10Z

Run Gluten Clickhouse CI on x86

marin-ma · 2025-04-18T13:14:31Z

@zhouyuan Could you help to review? Thanks!

github-actions · 2025-04-22T14:21:30Z

Run Gluten Clickhouse CI on x86

marin-ma · 2025-04-23T17:52:39Z

@zhouyuan Could you help to review? Thanks!

zhouyuan · 2025-04-23T19:04:38Z

-          GlutenShuffleUtils.getSortEvictBufferSize(sparkConf, compressionCodec);
+          GlutenShuffleUtils.getCompressionBufferSize(sparkConf, compressionCodec);
+      diskWriteBufferSize =
+          (int) (long) sparkConf.get(package$.MODULE$.SHUFFLE_DISK_WRITE_BUFFER_SIZE());


the code is little diffcult to understand, is it necessary to cast to long and then cast to int

In Spark's source code, the configurations are converted in this way. Here's an explanation apache/spark#24187 (comment)

If we don't convert to long first , it will encounter exception like this:
Caused by: java.lang.ClassCastException: java.lang.Long cannot be cast to java.lang.Integer

zhouyuan

+1

…onfiguration (apache#9356)

github-actions Bot added CORE works for Gluten Core VELOX RSS CLICKHOUSE labels Apr 17, 2025

separate compression buffer and disk write buffer

b514a4d

marin-ma force-pushed the shuffle-compression-config branch from 22ad109 to b514a4d Compare April 17, 2025 16:41

fix

35cea58

configurable deserializer buffer size

b46dd4c

github-actions Bot added the DOCS label Apr 22, 2025

marin-ma requested a review from zhouyuan April 23, 2025 17:52

zhouyuan approved these changes Apr 23, 2025

View reviewed changes

marin-ma merged commit d077f93 into apache:main Apr 23, 2025

marin-ma mentioned this pull request Apr 23, 2025

[GLUTEN-9163][VL] Use stream de/compressor in sort-based shuffle #9278

Merged

marin-ma added a commit to marin-ma/gluten that referenced this pull request Jul 16, 2025

[GLUTEN-9163][VL] Separate compression buffer and disk write buffer c…

c325b25

…onfiguration (apache#9356)

philo-he mentioned this pull request Dec 3, 2025

[VL] Allow the use of Gzip codec for shuffle compression #11242

Merged

warrenzhu25 pushed a commit to warrenzhu25/gluten that referenced this pull request Jan 10, 2026

[GLUTEN-9163][VL] Separate compression buffer and disk write buffer c…

f91a423

…onfiguration (apache#9356)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[GLUTEN-9163][VL] Separate compression buffer and disk write buffer configuration#9356

[GLUTEN-9163][VL] Separate compression buffer and disk write buffer configuration#9356
marin-ma merged 3 commits into
apache:mainfrom
marin-ma:shuffle-compression-config

marin-ma commented Apr 17, 2025 •

edited

Loading

github-actions Bot commented Apr 17, 2025

github-actions Bot commented Apr 17, 2025

github-actions Bot commented Apr 17, 2025

github-actions Bot commented Apr 18, 2025

marin-ma commented Apr 18, 2025

github-actions Bot commented Apr 22, 2025

marin-ma commented Apr 23, 2025

zhouyuan Apr 23, 2025

marin-ma Apr 23, 2025 •

edited

Loading

zhouyuan Apr 23, 2025

zhouyuan left a comment

Labels

2 participants

Uh oh!

Conversation

marin-ma commented Apr 17, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

github-actions Bot commented Apr 17, 2025

github-actions Bot commented Apr 17, 2025

github-actions Bot commented Apr 17, 2025

github-actions Bot commented Apr 18, 2025

marin-ma commented Apr 18, 2025

github-actions Bot commented Apr 22, 2025

marin-ma commented Apr 23, 2025

zhouyuan Apr 23, 2025

Choose a reason for hiding this comment

marin-ma Apr 23, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

zhouyuan Apr 23, 2025

Choose a reason for hiding this comment

zhouyuan left a comment

Choose a reason for hiding this comment

Labels

2 participants

marin-ma commented Apr 17, 2025 •

edited

Loading

marin-ma Apr 23, 2025 •

edited

Loading