[German]Small water level report for the start into June. Cloudflare`s DNS services has been unreachable on May 31th 2018, due to a software glitch. Here are a few details I know so far.
Last night we had here a violent thunderstorm – and if you could no longer access internet pages, there is an explanation. At least if you uses Cloudflare as a DNS server.
Cloudflare’s DNS service 188.8.131.52
Cloudflare offers a DNS service under the IP address 184.108.40.206 (see my blog post Cloudflare launches DNS Service with IP 220.127.116.11). Arguments for the offer included speed and above all the provision of privacy. Cloudflare assured to delete the data within 24 hours to ensure privacy.
(Cloudflare DNS address)
A few days ago I reported about a possible hijacking of Clouflare’s DNS service within my blog post CloudFlare DNS service 18.104.22.168 hacked from China … ? And now users failed to connect to internet servers, using Clouflare’s DNS service.
Homebrew problem with Clouflare’s DNS service
As of June 1, 2018, I read that the Cloudflare DNS service probably had failures. MS Power User cites problems with IPv4 connections:
Investigating – Cloudflare is currently seeing query timeouts for our IPv4 addresses for our public DNS resolver (22.214.171.124 and 126.96.36.199). IPv6 addresses are currently not affected (2606:4700:4700::1111 and 2606:4700:4700::1001). We are investigating further and will provide updates as they become available.
Currently, most systems are displayed as functional. At the moment you can still find the old messages under Past Incidents.
A German blog reader pointed out in a comment, that Cloudflare had published a blog post about a 17 Minutes outage.
On May 31, 2018 we had a 17 minute outage on our 188.8.131.52 resolver service; this was our doing and not the result of an attack.
Things went wrong
Today, in an effort to reclaim some technical debt, we deployed new code that introduced Gatebot to Provision API.
What we did not account for, and what Provision API didn’t know about, was that 184.108.40.206/24 and 220.127.116.11/24 are special IP ranges. Frankly speaking, almost every IP range is “special” for one reason or another, since our IP configuration is rather complex. But our recursive DNS resolver ranges are even more special: they are relatively new, and we’re using them in a very unique way. Our hardcoded list of Cloudflare addresses contained a manual exception specifically for these ranges.
As you might be able to guess by now, we didn’t implement this manual exception while we were doing the integration work. Remember, the whole idea of the fix was to remove the hardcoded gotchas!
The effect was that, after pushing the new code release, our systems interpreted the resolver traffic as an attack. The automatic systems deployed DNS mitigations for our DNS resolver IP ranges for 17 minutes, between 17:58 and 18:13 May 31st UTC. This caused 18.104.22.168 DNS resolver to be globally inaccessible.
Cloudflair wrote ‘lessons learned’ and apologize to all customers affected by this outage.
Cookies helps to fund this blog: Cookie settings