Cloudflare 1.1.1.1 DNS service had a 17 minutes outage

[German]Small water level report for the start into June. Cloudflare`s DNS services has been unreachable on May 31th 2018, due to a software glitch. Here are a few details I know so far.


Advertising

Last night we had here a violent thunderstorm – and if you could no longer access internet pages, there is an explanation. At least if you uses Cloudflare as a DNS server.

Cloudflare's DNS service 1.1.1.1

Cloudflare offers a DNS service under the IP address 1.1.1.1 (see my blog post Cloudflare launches DNS Service with IP 1.1.1.1). Arguments for the offer included speed and above all the provision of privacy. Cloudflare assured to delete the data within 24 hours to ensure privacy.

CloudFlare DNS 1.1.1.1(Cloudflare DNS address)

A few days ago I reported about a possible hijacking of Clouflare's DNS service within my blog post CloudFlare DNS service 1.1.1.1 hacked from China … ? And now users failed to connect to internet servers, using Clouflare's DNS service.

Homebrew problem with Clouflare's DNS service

As of June 1, 2018, I read that the Cloudflare DNS service probably had failures. MS Power User cites problems with IPv4 connections:


Advertising

Investigating – Cloudflare is currently seeing query timeouts for our IPv4 addresses for our public DNS resolver (1.1.1.1 and 1.0.0.1). IPv6 addresses are currently not affected (2606:4700:4700::1111 and 2606:4700:4700::1001). We are investigating further and will provide updates as they become available.

Currently, most systems are displayed as functional. At the moment you can still find the old messages under Past Incidents.

Cloudflase-Probleme

A German blog reader pointed out in a comment, that Cloudflare had published a blog post about a 17 Minutes outage.

On May 31, 2018 we had a 17 minute outage on our 1.1.1.1 resolver service; this was our doing and not the result of an attack.

Things went wrong

Today, in an effort to reclaim some technical debt, we deployed new code that introduced Gatebot to Provision API.

What we did not account for, and what Provision API didn't know about, was that 1.1.1.0/24 and 1.0.0.0/24 are special IP ranges. Frankly speaking, almost every IP range is "special" for one reason or another, since our IP configuration is rather complex. But our recursive DNS resolver ranges are even more special: they are relatively new, and we're using them in a very unique way. Our hardcoded list of Cloudflare addresses contained a manual exception specifically for these ranges.

As you might be able to guess by now, we didn't implement this manual exception while we were doing the integration work. Remember, the whole idea of the fix was to remove the hardcoded gotchas!

Impact

The effect was that, after pushing the new code release, our systems interpreted the resolver traffic as an attack. The automatic systems deployed DNS mitigations for our DNS resolver IP ranges for 17 minutes, between 17:58 and 18:13 May 31st UTC. This caused 1.1.1.1 DNS resolver to be globally inaccessible.

Cloudflair wrote 'lessons learned' and apologize to all customers affected by this outage.


Cookies helps to fund this blog: Cookie settings
Advertising


##1

This entry was posted in issue and tagged , . Bookmark the permalink.

Leave a Reply

Your email address will not be published. Required fields are marked *