Self-hosted Actions Runners #178369
Replies: 2 comments 1 reply
-
|
Hey 👋 — this kind of delay usually points to how the Below are a few things to check that normally reveal where the bottleneck is. 🔍 1. Inspect the controller logsRun: kubectl logs -n actions-runner-system deployment/arc-runner-controllerLook for entries like If these appear long before a pod gets created, the controller might be missing or delaying webhook events. ⚙️ 2. Validate your RunnerDeployment / HRA configurationMake sure:
🧩 3. Check pending pods directlykubectl get pods -A | grep runner
kubectl describe pod <pod-name>If pods are stuck in
Sometimes cluster autoscaling is fast, but the kube-scheduler still waits for nodes to register as Ready before binding pods. �� 4. Turn on debug logging temporarilyAdd to the controller deployment: env:
- name: LOG_LEVEL
value: debugRe-apply it. 🚀 5. Mitigations that usually help
📚 References
In most environments, enabling debug logs and maintaining a tiny idle pool fixes the 30-minute delay completely. |
Beta Was this translation helpful? Give feedback.
-
|
🕒 Discussion Activity Reminder 🕒 This Discussion has been labeled as dormant by an automated system for having no activity in the last 60 days. Please consider one the following actions: 1️⃣ Close as Out of Date: If the topic is no longer relevant, close the Discussion as 2️⃣ Provide More Information: Share additional details or context — or let the community know if you've found a solution on your own. 3️⃣ Mark a Reply as Answer: If your question has been answered by a reply, mark the most helpful reply as the solution. Note: This dormant notification will only apply to Discussions with the Thank you for helping bring this Discussion to a resolution! 💬 |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
-
Why are you starting this discussion?
Question
What GitHub Actions topic or product is this about?
ARC (Actions Runner Controller)
Discussion Details
We self-host arc-runner-controller in our kubernetes and there's something I can't reproduce but we see once in a while where a PR triggers 5 workfows, 4 succeed and the last one is waiting for a runner.
Normally that would be fine but it's taking 30+ minutes for this to be picked up and we have no problem with scaling nodes. Any way I can get more insights into why this happens and what the holdup is?
Beta Was this translation helpful? Give feedback.
All reactions