Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DNS caching causes host records to be returned in sorted order #7577

Closed
mattbaker opened this issue Aug 20, 2023 · 10 comments · Fixed by #7578
Closed

DNS caching causes host records to be returned in sorted order #7577

mattbaker opened this issue Aug 20, 2023 · 10 comments · Fixed by #7578
Labels
bug Issue is reported as a bug in progress priority:medium team:PS Assigned to OTP team PS
Milestone

Comments

@mattbaker
Copy link

mattbaker commented Aug 20, 2023

Describe the bug
When DNS caching is enabled, and a host has multiple records associated with it, the DNS cache returns those records in sorted order.

@starbelly helped me in this Erlang Forum post and showed how the issue stems from the usage of maps for de-duplication. When a host has <= 32 records it qualifies as a "small map" and the current implementation in Erlang will sort the entries by key. It explains why we were seeing calls to inet:getaddrs/2 returning IPs in sorted order.

To Reproduce
This gist is a fairly tight reproduction of the issue that you can run from the command line if you have docker installed. It demonstrates the issue by calling inet:getaddrs multiple times with and without caching enabled to highlight the difference.

Nothing about this bug is docker specific, but docker just so happens to offer a simple round-robin DNS system when you launch multiple containers with the same alias so it was very convenient for this bug report!

Expected behavior
We would expect the order of the records in the cache to match the order returned by the DNS server. We would not expect the order to be modified.

Affected versions
I believe this was introduced in #4633, which means this would affect OTP 24.0 and later based on references in the OTP 24 README.

Additional context
The current behavior of the cache breaks the round-robin DNS load balancing strategy in which a DNS server randomizes the order of records it returns for a given host lookup in order to distribute load evenly.

Because the cache returns the records in sorted order, an Erlang application with DNS caching enabled will always select the "lowest" record by sort order which sends all the load to a single, consistent IP.

My impression is round-robin DNS, while not without flaws, is a common enough practice that the current behavior of the DNS cache is probably not desirable.

I do wonder if the idea of passing a custom inet_tcp module mentioned in this comment could be a good short-term fix for anyone dealing with a similar concern.

@mattbaker mattbaker added the bug Issue is reported as a bug label Aug 20, 2023
@jimdigriz
Copy link
Contributor

jimdigriz commented Aug 20, 2023

Not sure a Wikipedia reference is great, but RFC1794, section 4 does have something to say about it.

Unfortunately you can still take two perfectly valid views:

  • DNS makes no guarantees about ordering so if you want a random shuffle, it needs to be done client side
  • clients have been observed not to shuffle, so shuffle server side to get the intended effect; this assumes all resolves pass records as is straight through to the client

Personally, I think the client application should be shuffling or alternatively the service reworked to use SRV records which are arguably what should be used. If DNS as a standard says nothing about ordering, I do not think the resolvers should be expected to follow a contract back to the application.

@starbelly
Copy link
Contributor

There's a lot of ways to look at architecture involving such services, but the fact is that they exist. The docker example is the most trivial, it's a a mere shuffle, in this case the client can get away with doing the shuffle themselves. Yet, there are services (aws route53 as an example) that return records in an order defined by policy (load, node health, weights, etc.), for good or for bad. It's subjective and to me devoid of the issue : There's a behavioral difference that was unintentionally (seemingly) introduced and I would file that under bug 😄

@mattbaker
Copy link
Author

It's subjective and to me devoid of the issue : There's a behavioral difference that was unintentionally (seemingly) introduced and I would file that under bug

This was my thinking as well. I provided the notes on round-robin under "additional context" for anyone curious about why we became aware of it, but I don't think it's core to the discussion. Maybe I should have left it out :)

There's a behavioral difference that was unintentionally (seemingly) introduced and I would file that under bug

Additionally I think it could be argued, though it's certainly subjective, that modifying a value in storage is surprising behavior for a caching mechanism.

Curious to hear what others think though!

@attah
Copy link
Contributor

attah commented Aug 20, 2023

Worth noting is that a "large" map will not help you maintain original order - just not produce a sorted order.
And with OTP26 you also no longer get the keys sorted, even for small maps, unless you ask for it.

@starbelly
Copy link
Contributor

Worth noting is that a "large" map will not help you maintain original order - just not produce a sorted order.
And with OTP26 you also no longer get the keys sorted, even for small maps, unless you ask for it.

Definitely worth noting yes. The call out about the small maps I believe was to point out that there was no explicit sorting done, just an artifact of de-duping with maps and the case that the number of entries remains with the domain of a small map.

As for sorted keys, that's still a thing in OTP 26, but might be changing in OTP 27 iirc.

@starbelly
Copy link
Contributor

FWIW, I went back and looked at the code again. I think everything could remain the same, so long as an index entry in the tuple could be added when records are initially inserted. I think that may be the only way in fact barring other complicated ways of solving the problem.

[{dns_rr,"rebar3.s3.amazonaws.com",cname,in,-576460747,3600,
         "s3-1-w.amazonaws.com",-576460747,"rebar3.s3.amazonaws.com",
         false, 0},
 {dns_rr,"s3-w.us-east-1.amazonaws.com",a,in,-576460747,3,
         {52,217,195,65},
         -576460747,"s3-w.us-east-1.amazonaws.com",false, 1},
  {dns_rr,"s3-w.us-east-1.amazonaws.com",a,in,-576460747,3,
         {52,217,87,116},
         -576460747,"s3-w.us-east-1.amazonaws.com",false, 2},

Though this may be problematic as far as users upgrading. That said, an index value existed in each tuple, inet_db could simply do a sort on that value, returning an original ordered record result set.

@IngelaAndin IngelaAndin added the team:PS Assigned to OTP team PS label Aug 21, 2023
@RaimoNiskanen
Copy link
Contributor

I will have a look at the deduplication code; it might be possible to preserve the ETS table order during deduplication, which should be the RR insert order since it is a 'bag' table, which might solve this issue.

@RaimoNiskanen
Copy link
Contributor

Please, if possible, evaluate PR #7578 to see if it solves the problem, and doesn't create new...

@juli-vert
Copy link

Not sure a Wikipedia reference is great, but RFC1794, section 4 does have something to say about it.

Unfortunately you can still take two perfectly valid views:

  • DNS makes no guarantees about ordering so if you want a random shuffle, it needs to be done client side
  • clients have been observed not to shuffle, so shuffle server side to get the intended effect; this assumes all resolves pass records as is straight through to the client

Personally, I think the client application should be shuffling or alternatively the service reworked to use SRV records which are arguably what should be used. If DNS as a standard says nothing about ordering, I do not think the resolvers should be expected to follow a contract back to the application.

Sorry to meddle here, but I disagree. Those RFC are pretty old and the modern DNS design is completely different that it was in the late 90s early 2000s.

Using SRV RR has no sense here, it's about multiple choices and those may be related to geolocation (for instance). Having the client reordering the responses is a kind of idempotent process, but it's not what it's expected in DNS RR. It's worth to say that DNS RR can be done both in server and client side, but client side shouldn't involve reordering, just purely round robin to the given order. I understand the logic behind reordering to guarantee the same response always, but it's something totally obsolete with the current "smart" DNS implemented all around (and yep, most of them not following 100% the standard).

Regards

RaimoNiskanen added a commit that referenced this issue Aug 28, 2023
…nto maint

* raimo/kernel/inet_dns-stable-cache/GH-7577/OTP-18731:
  Rewrite RR cache access time update to preserve table insert order
@RaimoNiskanen RaimoNiskanen added this to the OTP-26.1 milestone Aug 28, 2023
@RaimoNiskanen
Copy link
Contributor

A fix that retains the order of the cached RR records has been merged in PR #7578 to be released in OTP-26.1.

Thank you for reporting this issue!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Issue is reported as a bug in progress priority:medium team:PS Assigned to OTP team PS
Projects
None yet
Development

Successfully merging a pull request may close this issue.

7 participants