Skip to content

Commit e6a773b

Browse files
committed
Document how Prow jobs are used in etcd
- Introduction to Prow - How Prow is used for etcd testing - Navigating perfomance dashboards (Grafana) - Prow job categories - Interpreting metrics Signed-off-by: ronaldngounou <ronald.ngounou@yahoo.com>
1 parent 8a4955b commit e6a773b

File tree

1 file changed

+108
-0
lines changed

1 file changed

+108
-0
lines changed
Lines changed: 108 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,108 @@
1+
# Prow Jobs in etcd
2+
3+
## 1. Introduction to Prow
4+
5+
[Prow](https://docs.prow.k8s.io/docs/) is a Kubernetes based CI/CD system. Jobs can be triggered by various types of events and report their status to many different services. Prow provides GitHub automation through policy enforcement and chat-ops via `/command` interactions on pull requests (e.g., `/test`, `/approve`, `/retest`), enabling contributors to trigger jobs and manage workflows directly from GitHub comments.
6+
7+
When a user comments `/ok-to-test`or `/retest,` on a Pull Request, GitHub sends a webhook to Prow's Kubernetes cluster. Visit this [site](https://docs.prow.k8s.io/docs/life-of-a-prow-job/) to further understand the lifecycle of a Prow job.
8+
This is where you can find all etcd Prow jobs [status](https://prow.k8s.io/?repo=etcd-io%2Fetcd)
9+
10+
## 2. How Prow is used for etcd Testing
11+
12+
etcd's CI is managed by [kubernetes/test-infra](https://github.com/kubernetes/test-infra), running Prow.
13+
14+
When a pull request is submitted, or a `/command` is issued, the CI of etcd which managed by [kubernetes/test-infra](https://github.com/kubernetes/test-infra) uses Prow to run the tests. You can view all supported Prow [commands](https://prow.k8s.io/command-help).
15+
16+
### Jobs Types
17+
18+
The jobs [configuration](https://github.com/kubernetes/test-infra/tree/master/config/jobs/etcd) for etcd.
19+
20+
There are 3 different job types:
21+
22+
- Presubmits run against code in PRs
23+
- Postsubmits run after merging code
24+
- Periodics run on a periodic basis
25+
26+
Please see [ProwJob](https://docs.prow.k8s.io/docs/jobs/) docs for more info.
27+
28+
As an [example](https://github.com/kubernetes/test-infra/blob/master/config/jobs/etcd/etcd-presubmits.yaml), are the presubmits jobs of etcd. `pull-etcd-e2e-amd64` is one of the [presubmits](https://github.com/kubernetes/test-infra/blob/b21a1d3a72d5715ea7c9234cade21751847cfbe5/config/jobs/etcd/etcd-presubmits.yaml#L193).
29+
30+
Refer to [the test-infra Job Types documentation](https://github.com/kubernetes/test-infra/tree/master/config/jobs#job-types) to learn more about them.
31+
32+
### How to manually run a given job on Prow
33+
34+
These tests can be triggered when you leave a comment, like `/ok-to-test` (only triggered by an etcd-io member) or `/retest`, in PR [example](https://github.com/etcd-io/etcd/pull/20733#issuecomment-3341443205). You can find all supported [commands](https://prow.k8s.io/command-help).
35+
36+
## 3. Navigating Performance Dashboard (Grafana)
37+
38+
Test-infra's Prow exposes Grafana dashboards to provide visibility into build resource usage (CPU, memory, number of running builds, etc.) for the Prow build cluster’s Kubernetes jobs. It is scoped via organization, repository, build identifier and time range filters.
39+
40+
- [GKE Dashboards](https://monitoring-gke.prow.k8s.io/d/96Q8oOOZk/builds?orgId=1&refresh=30s&var-org=etcd-io&var-repo=etcd&var-build=All&from=now-7d&to=now)
41+
- [EKS Dashboards](https://monitoring-eks.prow.k8s.io/d/96Q8oOOZk/builds?orgId=1&refresh=30s&var-org=etcd-io&var-repo=etcd&var-build=All&from=now-7d&to=now)
42+
43+
### Panel: “Running / Pending Builds”
44+
45+
Shows the number of builds that are in Running vs Pending states over time.
46+
Use it to track build backlog or concurrency — e.g., if the “Pending” line rises, builds may be waiting for resources.
47+
If the “Running” line fluctuates a lot or remains at some steady value, you can infer how many builds typically run in parallel.
48+
49+
### Panel: “Memory Usage per Build”
50+
51+
Shows memory usage over time for each build ID (each build listed in the legend at the bottom).
52+
The y‑axis shows memory use (e.g., in MiB / GiB).
53+
Use this to spot builds with unusually high memory usage — a spike indicates one build consumed many resources.
54+
55+
### Panel: “CPU Usage per Build”
56+
57+
Similar to the memory panel but shows CPU usage per build over time. Spikes in CPU usage may indicate heavy compute jobs, inefficiencies, or need for resource tuning.
58+
59+
### Panel: "Resources"
60+
61+
- Memory panel
62+
63+
Green line (“used”): how much memory this build’s pod was using at each time point. Orange/Yellow line (“requested”): how much memory was requested (i.e., Kubernetes requests.memory) for that pod.
64+
Red line (“limit”): how much memory was limited (i.e., Kubernetes limits.memory) for that pod.
65+
Y‑axis: shows memory (GiB, MiB) over the build runtime.
66+
67+
X‑axis: time of day/date.
68+
If the green “used” line is close to or hits the red “limit”, it means the build came close to its memory cap (risking OOM). If “used” is much lower than “requested”, you may be over‑allocating memory (waste).
69+
If the “requested” line is much higher than “used”, it suggests the job’s request could be tuned downward.
70+
71+
- CPU panel
72+
73+
Similar structure: green = actual usage, orange/yellow = requested CPU, red = CPU limit (if set).
74+
Y‑axis often in number of CPU cores or fraction thereof (e.g., 1.0 = one core).
75+
A green line with spikes may show bursts of CPU usage (e.g., build or compile phases) while idle periods show low usage.
76+
If CPU usage consistently saturates the limit, the job may be throttled or delayed. If usage is consistently far below request, tuning may reduce cost.
77+
78+
It is useful for a few reasons:
79+
80+
1. Tuning resources: By drilling into each build-run, you can determine realistic memory & CPU requests and limits for that job‑type. This helps avoid waste or avoid failed builds hitting resource limits.
81+
82+
2. Spotting anomalies: If one build suddenly used 8 GiB while normally this job uses 1 GiB, it may indicate a regression or mis‑configuration.
83+
84+
3. Capacity planning: Seeing typical and peak usage helps cluster operators plan node sizes, scheduling, concurrency of builds, etc.
85+
86+
4. Debugging performance issues: A build with unexpectedly high CPU or memory might be stuck, looping, or consuming resources inefficiently.
87+
88+
## 3.1 Prow job categories (robustness, integration, static checks)
89+
90+
- Static check:
91+
- Description: Fast, deterministic checks (build, unit tests, linters, go vet/staticcheck, formatting, license/header checks, generated-code verification) that catch style, correctness and packaging problems early.
92+
- When to run: Every PR as presubmits; quick feedback loop before running expensive tests.
93+
- Example job patterns: pull-etcd-verify, pull-etcd-lint, pull-etcd-unit
94+
95+
- Tests:
96+
- Robustness:
97+
- Description: Long-running, fault-injection and chaos-style end-to-end tests that validate etcd correctness and availability under failures (node crashes, network partitions, resource exhaustion, upgrades).
98+
- When to run: Periodics for continuous coverage; run for PRs that touch consensus, storage, recovery, or upgrade paths.
99+
- Example job patterns: pull-etcd-robustness, periodic-robustness
100+
101+
- Integration:
102+
- Description: Functional end-to-end and cross-component tests that exercise real client/server interactions, snapshots/restore, upgrades and compatibility across OS/arch.
103+
- When to run: Presubmits for PRs that change APIs, client behavior, or integration points; periodics for broad platform coverage.
104+
- Example job patterns: pull-etcd-e2e-amd64, pull-etcd-integration
105+
106+
## 4. Interpreting Metrics
107+
108+
Some Prow components expose Prometheus metrics that can be used for monitoring and alerting. You can find metrics like the number of PRs in each Tide pool, a histogram of the number of PRs in each merge and various other metrics to this [site](https://github.com/kubernetes-sigs/prow/blob/main/site/content/en/docs/metrics/_index.md).

0 commit comments

Comments
 (0)