Scott Violet | aedbd55 | 2023-11-21 20:27:36 | [diff] [blame] | 1 | # Regressions to Competitive Benchmarks |
| 2 | |
| 3 | Speed is one of Chrome’s core strengths. One of the tools we use to |
| 4 | measure Chrome’s speed is through benchmarks, and specifically the |
| 5 | competitive benchmarks (Speedometer, MotionMark, JetStream). Improving and ensuring our current level of |
| 6 | performance does not regress is challenging. It’s all too easy for |
| 7 | performance regressions to creep in. Just as with test failures, the |
| 8 | longer we allow the regression to remain in the code base, the harder |
| 9 | it is to fix. |
| 10 | |
| 11 | To ensure Chrome’s performance on benchmarks does not regress, we have the following |
| 12 | policy: If a regression is detected, we will file a bug with relevant |
| 13 | data (including links to pinpoint runs). If after one business day the |
| 14 | regression has not been resolved, the patch is reverted. |
| 15 | |
| 16 | If you know your change impacts performance, you can reach out to the |
| 17 | appropriate group to discuss the issue before landing (see table that |
| 18 | follows with contacts). This can help prevent being reverted. Instructions for using |
| 19 | pinpoint are at the end of this document. |
| 20 | |
| 21 | This policy applies to the competitive benchmarks: JetStream, |
Michael Lippautz | 6ffcdf9 | 2023-11-30 14:39:07 | [diff] [blame] | 22 | MotionMark, and Speedometer. For Speedometer specifically, the policy applies |
| 23 | to both, the current (as of Nov 2023) stable version 2 and the work in progress |
| 24 | version 3. We expect Speedometer 3 to be fully released early 2024. At this |
| 25 | time, this policy applies to bots running MacOS with Apple Silicon. Each of |
| 26 | these benchmarks consists of a number of subtests. There are thresholds for |
| 27 | both the test, and subtest. |
Scott Violet | aedbd55 | 2023-11-21 20:27:36 | [diff] [blame] | 28 | |
| 29 | | Benchmark | Owner | Overall Threshold | Subtest Threshold | |
| 30 | |-------------|------------------------|-------------------|-------------------| |
| 31 | | JetStream | v8-performance-sheriff | .3% | 1% | |
| 32 | | MotionMark | chrome-gpu | 1% | 2% | |
| 33 | | Speedometer | v8-performance-sheriff | .3% | 1% | |
| 34 | |
| 35 | Pinpoint will be used to locate and validate the regression. The |
| 36 | number of runs will come from statistical analysis and may |
| 37 | change from time to time (currently around 128 for Speedometer). |
| 38 | |
| 39 | Bugs filed will generally have the following text: |
| 40 | |
| 41 | ***note |
| 42 | This patch has been identified as causing a statistically significant |
| 43 | regression to the competitive benchmark <NAME HERE>. The pinpoint run |
| 44 | <LINK HERE> gives high confidence this patch is responsible for the |
| 45 | regression. Please treat this as you would a unit test failure and |
| 46 | resolve the issue promptly. If you do not resolve the issue in 24 |
| 47 | hours the patch will be reverted. For help, please reach out to the |
| 48 | appropriate group and/or owner. |
| 49 | The recommended course of action is: |
| 50 | 1. Revert patch. |
| 51 | 2. If unsure why your patch caused a regression, reach out to owners. |
| 52 | 3. Update patch. |
| 53 | 4. Use pinpoint to verify no regressions. |
| 54 | 5. Reland. |
| 55 | Each patch is unique, so while this is the recommended course of action, it won't cover every case. |
| 56 | More information on this policy can be found [here](https://chromium.googlesource.com/chromium/src/+/main/docs/benchmark_performance_regressions.md). |
| 57 | *** |
| 58 | |
| 59 | ### Using pinpoint |
| 60 | |
| 61 | To run a pinpoint job you can either use a command line tool |
| 62 | (```depot_tools/pinpoint```) or [pinpoint](https://pinpoint-dot-chromeperf.appspot.com/). |
| 63 | I recommend the web ui as it's better supported. To use the web ui click the plus button in the |
| 64 | bottom right. For the bot, use mac-m1_mini_2020-perf or mac-m1_mini_2020-perf-pgo. The PGO |
| 65 | builder is closer to what we ship, but it will use a slightly dated pgo profile, which means |
| 66 | the results may not be exactly what you see once the profile is built with your change. The |
| 67 | following table suggests what to enter for the benchmark and story fields: |
| 68 | |
| 69 | |Benchmark | Benchmark Field | Story | |
| 70 | |-------------|-----------------------------|----------------------------| |
| 71 | | Jestream | Jetstream2 | Jetstream2 | |
| 72 | | Speedometer | speedometer2 | Speedometer2 | |
Michael Lippautz | 6ffcdf9 | 2023-11-30 14:39:07 | [diff] [blame] | 73 | | | speedometer3 | Speedometer3 | |
Scott Violet | aedbd55 | 2023-11-21 20:27:36 | [diff] [blame] | 74 | | MotionMark | rendering.desktop.notracing | motionmark_ramp_composite | |
| 75 | |
| 76 | The only other field you should need to fill in is the "Exp patch" field. Put your URL of your |
| 77 | patch there, and click "Start". |