Skip to content

Conversation

@jerryshao
Copy link
Contributor

What changes were proposed in this pull request?

This PR adds the support of downloading job artifacts from the Hadoop compatible filesystems.

Why are the changes needed?

Fix: #8725

Does this PR introduce any user-facing change?

No.

How was this patch tested?

Local tests.

@jerryshao jerryshao requested review from mchades and yuqi1129 October 14, 2025 07:55
@jerryshao jerryshao self-assigned this Oct 14, 2025
@jerryshao jerryshao added the branch-1.0 Automatically cherry-pick commit to branch-1.0 label Oct 14, 2025
@jerryshao jerryshao marked this pull request as draft October 14, 2025 08:53
// We need to add more later on when we have more catalog implementations.
return barrierClasses.stream().anyMatch(name::startsWith);
return barrierClasses.stream().anyMatch(name::startsWith)
|| name.startsWith("org.apache.hadoop");
Copy link
Contributor

@yuqi1129 yuqi1129 Oct 15, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not just put "org.apache.hadoop" in the barrierClasses, Is the barrierClasses designed to handle this case?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm just testing the code. I will polish it when everything is ready. There still have some other issues according to the CI, I need to investigate more.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The previous CI is passed, but it introduces other problems, I need to check it.

put the `hadoop-aws` and its dependencies in the `{GRAVITINO_HOME}/libs` directory. By
default, the HDFS client libraries are already included in the Gravitino server package.
2. The Gravitino server has the corresponding Hadoop configurations in its classpath
`{GRAVITINO_HOME}/conf` directory. For example, if you want to use `s3a://` scheme, you need to
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The Gravitino server has the corresponding Hadoop configurations in its classpath
{GRAVITINO_HOME}/conf directory

I can't get this sentence, as there are no *.xml in the folder {GRAVITINO_HOME}/conf, why do you say that?

Should the word has in Gravitino server has the corresponding be needs?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

branch-1.0 Automatically cherry-pick commit to branch-1.0

2 participants