Amazon AWS cloud outage causes chaos (2021/12/08)

[German]As of today, December 8, 2021, there was a major outage of Amazon AWS services in the US. The Amazon Cloud service was disrupted for about 8 hours, and all users who relied on this service were pretty much left looking down the tubes. No more orders, Alexa, Ring and Disney Plus were also on strike. Could have been a bigger drama for some users.


Advertising

The problems with Amazon Web Services began at 9:37 a.m. (Pacific time), when servers for the U.S. East Coast were slow to deliver content or reported errors. The cause of this issue was identified as a degradation of multiple network devices in the US EAST-1 region. There are these status updates from Amazon about the outage.

[RESOLVED] API Error Rates in US-EAST-1

[9:37 AM PST] We are seeing impact to multiple AWS APIs in the US-EAST-1 Region. This issue is also affecting some of our monitoring and incident response tooling, which is delaying our ability to provide updates. We have identified the root cause and are actively working towards recovery.
[10:12 AM PST] We are seeing impact to multiple AWS APIs in the US-EAST-1 Region. This issue is also affecting some of our monitoring and incident response tooling, which is delaying our ability to provide updates. We have identified root cause of the issue causing service API and console issues in the US-EAST-1 Region, and are starting to see some signs of recovery. We do not have an ETA for full recovery at this time.
[11:26 AM PST] We are seeing impact to multiple AWS APIs in the US-EAST-1 Region. This issue is also affecting some of our monitoring and incident response tooling, which is delaying our ability to provide updates. Services impacted include: EC2, Connect, DynamoDB, Glue, Athena, Timestream, and Chime and other AWS Services in US-EAST-1. The root cause of this issue is an impairment of several network devices in the US-EAST-1 Region. We are pursuing multiple mitigation paths in parallel, and have seen some signs of recovery, but we do not have an ETA for full recovery at this time. Root logins for consoles in all AWS regions are affected by this issue, however customers can login to consoles other than US-EAST-1 by using an IAM role for authentication.
[12:34 PM PST] We continue to experience increased API error rates for multiple AWS Services in the US-EAST-1 Region. The root cause of this issue is an impairment of several network devices. We continue to work toward mitigation, and are actively working on a number of different mitigation and resolution actions. While we have observed some early signs of recovery, we do not have an ETA for full recovery. For customers experiencing issues signing-in to the AWS Management Console in US-EAST-1, we recommend retrying using a separate Management Console endpoint (such as https://us-west-2.console.aws.amazon.com/). Additionally, if you are attempting to login using root login credentials you may be unable to do so, even via console endpoints not in US-EAST-1. If you are impacted by this, we recommend using IAM Users or Roles for authentication. We will continue to provide updates here as we have more information to share.
[2:04 PM PST] We have executed a mitigation which is showing significant recovery in the US-EAST-1 Region. We are continuing to closely monitor the health of the network devices and we expect to continue to make progress towards full recovery. We still do not have an ETA for full recovery at this time.
[2:43 PM PST] We have mitigated the underlying issue that caused some network devices in the US-EAST-1 Region to be impaired. We are seeing improvement in availability across most AWS services. All services are now independently working through service-by-service recovery. We continue to work toward full recovery for all impacted AWS Services and API operations. In order to expedite overall recovery, we have temporarily disabled Event Deliveries for Amazon EventBridge in the US-EAST-1 Region. These events will still be received & accepted, and queued for later delivery.
[3:03 PM PST] Many services have already recovered, however we are working towards full recovery across services. Services like SSO, Connect, API Gateway, ECS/Fargate, and EventBridge are still experiencing impact. Engineers are actively working on resolving impact to these services.
[4:35 PM PST] With the network device issues resolved, we are now working towards recovery of any impaired services. We will provide additional updates for impaired services within the appropriate entry in the Service Health Dashboard.

Vice magazine writes here, that websites and online services (including Motherboard) hosted by Amazon are experiencing outages and technical difficulties worldwide due to an ongoing outage of Amazon Web Services. Arguably, Amazon employees have been hit harder. Hundreds of Amazon warehouse workers and delivery drivers reported that the company's delivery infrastructure has ground to a halt and is generally in chaos. The backdrop is that the Flex app, used for critical delivery operations, and the Dolphin app, used for time tracking and other operations, were down all morning Dec. 8. One Amazon employee joked that he was currently earning more than Jeff Bezoz because his salary kept going while Amazon was posting losses.

I am making more money than Jeff bezos in this moment

Right now everything is down world wide and I am still getting paid double over time.

Amazon is on stand still so I sure this second alone he is losing $$$$$

The Verge wrote in this article, that there are reports about outages outages with Disney Plus and Netflix streaming, as well as games like PUBG, League of Legends, and Valorant. The editors also noted some issues accessing Amazon.com and other Amazon products such as the AI assistant Alexa, Kindle ebooks, Amazon Music, and security cameras from Ring or Wyze. DownDetector's list of services with simultaneous outages includes almost all the familiar names: Tinder, Roku, Coinbase, both Cash App and Venmo, and the list goes on. The incident shows that even in the cloud you are extremely dependent on outages – and when the cloud is down, millions of people are immediately affected.


Advertising

This entry was posted in Cloud, issue and tagged , , . Bookmark the permalink.

Leave a Reply

Your email address will not be published. Required fields are marked *

Note: Please note the rules for commenting on the blog (first comments and linked posts end up in moderation, I release them every few hours, I rigorously delete SEO posts/SPAM).