Skip to content

[GLUTEN-8453][VL] Follow-up to #8454 to add a ensureVeloxBatch API for limited use cases#8463

Merged
zhztheplayer merged 4 commits into
apache:mainfrom
zhztheplayer:wip-8454
Jan 9, 2025
Merged

[GLUTEN-8453][VL] Follow-up to #8454 to add a ensureVeloxBatch API for limited use cases#8463
zhztheplayer merged 4 commits into
apache:mainfrom
zhztheplayer:wip-8454

Conversation

@zhztheplayer

Copy link
Copy Markdown
Member

This is a follow-up change for #8454.

Usually a Gluten query plan knows exactly the batch type of data it processes with the help from Gluten's transition planner. Table cache write is an exception here because vanilla Spark's cache generation code simply calls API CachedBatchSerializer#convertColumnarBatchToCachedBatch for a child plan with supportsColumnar=true. Hence, we have to dynamically do to-Velox batch conversions in the implementation code of CachedBatchSerializer#convertColumnarBatchToCachedBatch because we don't know the batch type the child plan outputs.

The patch adds an ensureVeloxBatch API for dynamical to-Velox batch conversion. The API should only be used in table cache write or similar scenarios that explicit transitions are not able to add.

The patch adds a test case for the original issue #8453 also.

@github-actions github-actions Bot added the VELOX label Jan 8, 2025
@zhztheplayer zhztheplayer changed the title [VL] Follow-up to #8454 to add a ensureVeloxBatch API for limited use cases Jan 8, 2025
@github-actions

github-actions Bot commented Jan 8, 2025

Copy link
Copy Markdown
@apache apache deleted a comment from github-actions Bot Jan 8, 2025
@zhztheplayer

Copy link
Copy Markdown
Member Author
@zhztheplayer zhztheplayer merged commit 73ee147 into apache:main Jan 9, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

2 participants