-
Notifications
You must be signed in to change notification settings - Fork 1.6k
Closed
Labels
sig/api-machineryCategorizes an issue or PR as relevant to SIG API Machinery.Categorizes an issue or PR as relevant to SIG API Machinery.sig/appsCategorizes an issue or PR as relevant to SIG Apps.Categorizes an issue or PR as relevant to SIG Apps.sig/nodeCategorizes an issue or PR as relevant to SIG Node.Categorizes an issue or PR as relevant to SIG Node.sig/schedulingCategorizes an issue or PR as relevant to SIG Scheduling.Categorizes an issue or PR as relevant to SIG Scheduling.stage/stableDenotes an issue tracking an enhancement targeted for Stable/GA statusDenotes an issue tracking an enhancement targeted for Stable/GA statuswg/batchCategorizes an issue or PR as relevant to WG Batch.Categorizes an issue or PR as relevant to WG Batch.
Milestone
Description
Enhancement Description
-
One-line enhancement description (can be used as a release note): An API to influence retries based on exit codes and/or pod deletion reasons.
-
Kubernetes Enhancement Proposal: https://git.k8s.io/enhancements/keps/sig-apps/3329-retriable-and-non-retriable-failures
-
Discussion Link: RFE: ability to define special exit code to terminate existing job kubernetes#17244
-
Primary contact (assignee): @alculquicondor
-
Responsible SIGs: apps, api-machinery, scheduling
-
Enhancement target (which target equals to which milestone):
- Alpha release target (x.y): 1.25
- Beta release target (x.y): 1.26
- Stable release target (x.y): 1.31
-
Alpha
- KEP (
k/enhancements
) update PR(s):- KEP-3329 Add KEP for Retriable and non-retriable Pod failures for Jobs #3374
- Update KEP-3329 "Retriable and non-retriable Pod failures for Jobs" #3438
- Additional update for KEP-3329 "Retriable and non-retriable Pod failures for Jobs" #3447
- Updates to KEP-3329 "Retriable and non-retriable Pod failures for Jobs" #3452
- Code (
k/k
) update PR(s):- Refactor gc_controller to do not use the deletePod stub kubernetes#111070
- Refactor taint_manager to do not use getPod and getNode stubs kubernetes#111084
- Add integration test for podgc kubernetes#111091
- Append new pod conditions when deleting pods to indicate the reason for pod deletion kubernetes#110959
- Support handling of pod failures with respect to the configured rules kubernetes#111113
- Add worker to clean up stale DisruptionTarget condition kubernetes#111475
- Docs (
k/website
) update PR(s): Add docs for KEP-3329 Retriable and non-retriable Pod failures for Jobs website#35219
- KEP (
-
Beta
- KEP (
k/enhancements
) update PR(s):- Update KEP-3329 "Retriable and non-retriable Pod failures for Jobs" for Beta #3463
- Update for "Retriable and non-retriable Pod failures for Jobs" #3646
- Testgrid links to e2e tests for "KEP-3329: Retriable and non-retriable Pod failures for Jobs" #3769
- Update for second Beta with GA criteria for "KEP-3329: Retriable and non-retriable Pod failures for Jobs" #3757
- v1.28
- v1.30
- Code (
k/k
) update PR(s):- Add pod disruption conditions for kubelet-initiated failures kubernetes#112360
- Extend metrics with the new labels kubernetes#113324
- Use SSA to add pod failure conditions kubernetes#113304
- Enable the "Retriable and non-retriable pod failures for jobs" feature into beta kubernetes#113360
- Add e2e test for job pod failure policy used to match pod disruption kubernetes#113812
- Fix disruption controller permissions to allow patching pod's status kubernetes#113580
- Fix match onExitCodes when Pod is not terminated kubernetes#113856
- Wait for Pods to finish before considering Failed in Job kubernetes#113860
- Add e2e test to ignore failures with 137 exit code kubernetes#113927
- Fix clearing of rate-limiter for the queue of checks for cleaning stale pod disruption conditions kubernetes#114770
- Adjust DisruptionTarget condition message to do not include preemptor pod metadata kubernetes#114914
- PodGC should not add DisruptionTarget condition for pods which are in terminal phase kubernetes#115056
- Give terminal phase correctly to all pods that will not be restarted kubernetes#115331
- API-initiated eviction: handle deleteOptions correctly kubernetes#116554
- Add DisruptionTarget condition when preempting for critical pod kubernetes#117586
- Job: create replacement pods only after terminated kubernetes#117015
- Use Patch instead of SSA for Pod Disruption condition kubernetes#121103
- Docs (
k/website
) update(s):- Promote "Retriable and non-retriable pod failures for Jobs" to Beta website#37242
- Document for "Wait for Pods to finish before considering Failed in Job" website#38040
- Extend documentation on PodGC focusing on PodDisruptionConditions enabled website#38042
- Update docs for KEP3329: "Retriable and non-retriable Pod failures for jobs website#39809
- Add information about PodReplacementPolicy in Job API website#41745
- KEP (
-
Stable
- KEP (
k/enhancements
) update PR(s): Graduate Job Pod Failure Policy to stable #4661 - Code (
k/k
) update PR(s):- scheduler: Test that the DisruptionTarget condition is added at preemption time kubernetes#125533
- Graduate JobPodFailurePolicy to stable kubernetes#125442
- Graduate PodDisruptionConditions to stable kubernetes#125461
- Promote JobPodFailurePolicy and PodDisruptionConditions e2e tests to Conformance kubernetes#125482
- Use omitempty for optional fields in Job Pod Failure Policy kubernetes#126046
- Fix a scheduler preemption issue where the victim isn't properly patched, leading to preemption not functioning as expected kubernetes#126644
- clean up codes after PodDisruptionConditions was promoted to GA kubernetes#125994
- cleanup after JobPodFailurePolicy is promoted to GA kubernetes#126102
- Docs (
k/website
) update(s):
- KEP (
seifrajhi, mimowo, SergeyKanzhelev and toyamagu-2021
Metadata
Metadata
Assignees
Labels
sig/api-machineryCategorizes an issue or PR as relevant to SIG API Machinery.Categorizes an issue or PR as relevant to SIG API Machinery.sig/appsCategorizes an issue or PR as relevant to SIG Apps.Categorizes an issue or PR as relevant to SIG Apps.sig/nodeCategorizes an issue or PR as relevant to SIG Node.Categorizes an issue or PR as relevant to SIG Node.sig/schedulingCategorizes an issue or PR as relevant to SIG Scheduling.Categorizes an issue or PR as relevant to SIG Scheduling.stage/stableDenotes an issue tracking an enhancement targeted for Stable/GA statusDenotes an issue tracking an enhancement targeted for Stable/GA statuswg/batchCategorizes an issue or PR as relevant to WG Batch.Categorizes an issue or PR as relevant to WG Batch.
Type
Projects
Status
New New
Status
Closed
Status
Closed