-
Notifications
You must be signed in to change notification settings - Fork 18.7k
Description
I've frequently noticed that our net DNS tests running on builders are often flaky.
For example:
https://build.golang.org/log/ce5a87135d1a5ed4f17bd998ace2e0060b9ad597
https://build.golang.org/log/b3e762fc83d463acba21987ff558c8018b33c7cb
https://build.golang.org/log/250fc567590d125f1c8fd27740105eb7288ab16c
--- FAIL: TestLookupDotsWithRemoteSource (5.05s)
lookup_test.go:566: LookupSRV(xmpp-server, tcp, google.com): lookup _xmpp-server._tcp.google.com on 8.8.8.8:53: no such host (mode=go)
--- FAIL: TestLookupDotsWithRemoteSource (5.46s)
lookup_test.go:540: LookupMX(google.com): lookup google.com on 8.8.8.8:53: no such host (mode=cgo)
FAIL
FAIL net 7.838s
--- FAIL: TestLookupGmailNS (5.01s)
lookup_test.go:142: lookup gmail.com. on 8.8.8.8:53: dial udp 8.8.8.8:53: i/o timeout
FAIL
FAIL net 7.336s
etc.
Notice they're all after 5 seconds. (our default DNS timeout)
Did a UDP request get lost?
Did a UDP response get lost?
Does NAT make some builders worse?
Should we make builders re-try all DNS tests N times?
But this is also flaky (but to a much lesser degree) on my desktop on wired ethernet. With 500 runs, I still see occasional failures.
Maybe we should make our net package's DNS code automatically resend the UDP request after half the timeout? (i.e. after 2.5 seconds by default)