Skip to content

net: retry DNS lookups before failure? #16865

@bradfitz

Description

@bradfitz

I've frequently noticed that our net DNS tests running on builders are often flaky.

For example:

https://build.golang.org/log/ce5a87135d1a5ed4f17bd998ace2e0060b9ad597
https://build.golang.org/log/b3e762fc83d463acba21987ff558c8018b33c7cb
https://build.golang.org/log/250fc567590d125f1c8fd27740105eb7288ab16c

--- FAIL: TestLookupDotsWithRemoteSource (5.05s)
    lookup_test.go:566: LookupSRV(xmpp-server, tcp, google.com): lookup _xmpp-server._tcp.google.com on 8.8.8.8:53: no such host (mode=go)

--- FAIL: TestLookupDotsWithRemoteSource (5.46s)
    lookup_test.go:540: LookupMX(google.com): lookup google.com on 8.8.8.8:53: no such host (mode=cgo)
FAIL
FAIL    net 7.838s

--- FAIL: TestLookupGmailNS (5.01s)
    lookup_test.go:142: lookup gmail.com. on 8.8.8.8:53: dial udp 8.8.8.8:53: i/o timeout
FAIL
FAIL    net 7.336s

etc.

Notice they're all after 5 seconds. (our default DNS timeout)

Did a UDP request get lost?

Did a UDP response get lost?

Does NAT make some builders worse?

Should we make builders re-try all DNS tests N times?

But this is also flaky (but to a much lesser degree) on my desktop on wired ethernet. With 500 runs, I still see occasional failures.

Maybe we should make our net package's DNS code automatically resend the UDP request after half the timeout? (i.e. after 2.5 seconds by default)

/cc @mdempsky @josharian @minux @ianlancetaylor @mikioh

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions