Cisco Webex has been down (July 28, 2022)

Stop - Pixabay[German]German blog reader Gerald just emailed me to let me know (thanks for that) that the Cisco Webex video conferencing service was down. I had a quick look, the disruption started at about 19:00 German time and lasted at about 22:00. The Webex Room Systems service is currently (22:41) still listed with a "Major Impact" and Cisco is also still listing a disruption with the Webex Clod Registered Device. So the disruption seems to be slowly subsiding.


Advertising


Gerald has send the following screenshot with the WebEx status a couple of hours ago.

Cisco Webex status

The following graphic is available on allestoerungen.de and shows the start and end of the disruption on July 28, 2022 here in Germany (Central European Time, CET, UTC +1).

Webex down

On Twitter, Webex support confirmed problems with the service two hours ago. It says there:

We're aware of issues affecting several Webex services and our engineering team is hard at work. We apologize for the inconvenience. Regular updates will be shared here as soon we as have them. We appreciate your patience.

In addition, users may experience difficulties logging into Webex Control Hub and managing their Webex Cloud device. All hands are on deck to restore services and we apologize for the inconvenience this may cause.

 

Update to service (1/2) Engineering is investigating an issue which is causing connectivity issues with the Webex App. This may include logging in, sending messages and files, presence issues, and starting or joining meetings in the Webex App.

But the technicians seem to have quickly got to grips with the problem, because 44 minutes ago they said:


Advertising

Engineering is continuing to take remediation steps to restore services. We appreciate everyone's patience while we are all hands on deck to address the incident.

For users who are still unable to log into the Webex App, we recommend that you close and re-launch the Webex App. We will continue to provide updates as they become available. Once again, we appreciate your patience today.

And the last message a few minutes ago confirms that the service is probably largely working again.

The majority of services have been restored and are operational. Some users may still experience intermittent issues accessing their Webex board and devices.

Here is the overview of the Cisco Webex status page, which can be accessed here, where only residual faults can be seen

Webex status

Interesting in this context is the following tweet, which also reports an AWS problem on the East Coast of the USA.

AWS outage

Addendum: Cisco has released the following report about the incident.

Webex Services Incident

Incident Number: INC0047261
Incident Duration: July 28, 2022, 16:55 – 19:55 UTC

Incident Details
At 16:55 UTC on July 28 th , a service provider hosting some of the Webex services experienced a significant outage. Services for messaging,
devices, authentication, and analytics were hosted in the affected service provider data center, which caused multiple microservices to
fail. Users were unable to authenticate to Webex, register or connect to meetings consistently from Webex devices, use messaging, or log into the Control Hub administration console. The Webex status page was also hosted in the same service provider data center, which caused delayed access to the status page.

Root Cause
The Webex engineering team attempted to redirect services outside of the affected environment; however, the redirects were not
successful due to the outage affecting multiple redundant service zones. Due to a core component of the service architecture becoming
unavailable, the redirects to alternative zones were ineffective. Engineering was unable to successfully stop the services in the hosted
datacenter due to connectivity failures. This caused services to remain unstable as device and software clients continued to send
connection retries. The combination of incomplete service termination, the unhealthy data center, and the multiple client retries caused the connections to overload the available service capacity, and the capacity of the edge and microservices in the redundant environment was unable to process the traffic.

Corrective Actions
Engineering worked with the service provider to restore the core service architecture and scaled up the edge and service capacity within
the environment which allowed the clients to successfully connect, and the additional traffic generated by the retries to stop. This caused
services to stabilize, which allowed engineering to take additional remediation steps, including restarting unhealthy instances and
additional load balancing, leading to full service recovery. Engineering completed remaining service clean-up and load balancing, and services were fully recovered at 19:55 UTC.
The service team has identified areas of improvement for our incident response time, including changes to the service metrics for faster root cause identification, improved runbooks to speed up scale-up and deployment of additional micro-services, and revisiting client retry values which will improve service restoration should a similar incident occur in the future. The messaging architecture team is also engaged to identify service architecture improvements which will allow Webex services to better withstand a multi-zone failure.

Timeline
16:55: Webex alerts indicate multiple service failures; incident process began
17:15: Engineering began service redeploys
17:33: Services fully stopped in the affected DC
17:45: Service provider reported partial recovery
18:00: Metrics indicate high traffic volume due to client retries
18:30: Multiple scale-up efforts completed
19:10: Capacity increase across both pools completed; 80% increase in traffic observed. Service redeploys began
19:45: Vendor confirms services fully restored. Additional capacity deploys completed in all three zones
19:55: Final redeploys completed; services restored.


Advertising

This entry was posted in issue and tagged . Bookmark the permalink.

One Response to Cisco Webex has been down (July 28, 2022)

  1. Shan says:

    Is Webex Down Outage Right Now? Are you also having issues? Select the option you are having issues with, and help provide feedback to the service.
    https://downoutages.com/status/webex-down-outage/

Leave a Reply

Your email address will not be published. Required fields are marked *

Note: Please note the rules for commenting on the blog (first comments and linked posts end up in moderation, I release them every few hours, I rigorously delete SEO posts/SPAM).