Skip to content

Conversation

@deniskuzZ
Copy link
Member

@deniskuzZ deniskuzZ commented Dec 15, 2025

What changes were proposed in this pull request?

  • Use PartitionStatsHandler from iceberg and drop the patched class;
  • Upgrade roaringbit to 1.3.0;
  • Upgrade iceberg version to 1.10.0 in itests;

Why are the changes needed?

Get rid of code duplication

Does this PR introduce any user-facing change?

No

How was this patch tested?

Jenkins

@Aggarwal-Raghav
Copy link
Contributor

Aggarwal-Raghav commented Dec 16, 2025

@deniskuzZ , can we upgrade roaringbit to 1.3.0 as well?
Iceberg 1.10.0 has moved to 1.3.0. Hive is on 1.2.1, Tez is on 1.2.1 .
As we are shipping roaringbit as part of hive-exec jar and in container classpath hive-exec takes precedence, we can upgrade the roaringbit in hive independently.

I was planning to upgrade it but now we have addendum, i believe we can target it now.

@Aggarwal-Raghav
Copy link
Contributor

Also, a dilemma regarding zstd-jni. With parquet-1.16.0 iceberg has runtime optional dependency on 1.5.7... but we are shipping 1.5.6.. coming from orc (transitively). I don't know what's the way forward here But I think we are ok unless there is a breaking change b/w the versions.

Screenshot 2025-12-16 at 10 56 12 AM
@Aggarwal-Raghav
Copy link
Contributor

@deniskuzZ , i was checking #6238 and observed that there is iceberg-1.9.1 getting used in iceberg tests. Meaning tests are running with older version of iceberg for that particular module. It's better to handle in this PR itersef than #6238 to get early feedback on UT.

<iceberg.version>1.9.1</iceberg.version>

@deniskuzZ
Copy link
Member Author

deniskuzZ commented Dec 17, 2025

@deniskuzZ , i was checking #6238 and observed that there is iceberg-1.9.1 getting used in iceberg tests. Meaning tests are running with older version of iceberg for that particular module. It's better to handle in this PR itersef than #6238 to get early feedback on UT.

thanks! I missed that. fixed

// NOTE: we intentionally do not call commitTransaction(), so this property change is never published.
Transaction tx = table.newTransaction();
tx.updateProperties()
.remove(TableProperties.DEFAULT_FILE_FORMAT)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In older iceberg/iceberg-handler/src/main/java/org/apache/iceberg/data/PartitionStatsHandler.java When file format was ORC, we were using AVRO based stats and now we will start using Parquet (DEFAULT_FILE_FORMAT_DEFAULT).

Question:
For iceberg table created with 1.9.1 having some pre-existing data, we have stats based on AVRO and after upgrading to iceberg 1.10.0, the new data will have the new incremental stats file based on AVRO + Parquet. Is my understanding correct?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, users would have to drop the old stats. https://issues.apache.org/jira/browse/HIVE-28170 might have helped

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Post rewrite_data_files / compaction, i think it should be ok.

Copy link
Member

@ayushtkn ayushtkn Dec 18, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@deniskuzZ why don't we rather stick to AVRO itself & maintain the behaviour like it is currently, we can put DEFAULT_FILE_FORMAT as AVRO here & it should behave same as it is now? is it for some perf reasons?

@Aggarwal-Raghav
Copy link
Contributor

LGTM +1

Copy link
Contributor

@difin difin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM +1

@deniskuzZ deniskuzZ merged commit 1271e84 into apache:master Dec 19, 2025
2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

5 participants