{"id":271010,"date":"2022-07-28T22:47:43","date_gmt":"2022-07-28T20:47:43","guid":{"rendered":"https:\/\/www.borncity.com\/blog\/?p=271010"},"modified":"2023-06-22T15:30:09","modified_gmt":"2023-06-22T13:30:09","slug":"cisco-webex-down-28-7-2022","status":"publish","type":"post","link":"https:\/\/borncity.com\/blog\/2022\/07\/28\/cisco-webex-down-28-7-2022\/","title":{"rendered":"Cisco Webex down (28.7.2022)"},"content":{"rendered":"<p><img decoding=\"async\" style=\"float: left; margin: 0px 10px 0px 0px; display: inline;\" title=\"Stop - Pixabay\" src=\"https:\/\/borncity.com\/blog\/wp-content\/uploads\/2021\/06\/Stop01.jpg\" alt=\"Stop - Pixabay\" align=\"left\" \/>[<a href=\"https:\/\/borncity.com\/win\/2022\/07\/28\/cisco-webex-down-28-7-2022\/\" target=\"_blank\" rel=\"noopener\">English<\/a>]Blog-Leser Gerald hat mich gerade per E-Mail dar\u00fcber informiert (danke daf\u00fcr), dass der Videokonferenzdienst Cisco Webex gest\u00f6rt sei. Ich habe mal kurz geschaut, die St\u00f6rung begann um ca. 19:00 Uhr deutscher Zeit und dauerte bei ca. 22:00 Uhr. Der Webex Room Systems-Service wird aktuell (22:41 Uhr) noch mit einem \"Major Impact\" gef\u00fchrt und auch beim Webex Clod Registered Device listet Cisco noch eine St\u00f6rung. Die St\u00f6rung scheint also langsam abzuklingen. <strong>Erg\u00e4nzung:<\/strong> Erkl\u00e4rung von Cisco hinzugef\u00fcgt.<\/p>\n<p><!--more--><\/p>\n<p>Gerald hat mir den nachfolgenden Screenshot des WebEx-Status vor einigen Stunden mitgeschickt. Da ist m\u00e4chtig was los gewesen.<\/p>\n<p><img decoding=\"async\" src=\"https:\/\/i.imgur.com\/KvUqie1.png\" alt=\"Cisco Webex status\" \/><\/p>\n<p>Auf <a href=\"https:\/\/web.archive.org\/web\/20211006144247\/https:\/\/xn--allestrungen-9ib.de\/stoerung\/webex\/\" target=\"_blank\" rel=\"noopener\">allestoerungen.de<\/a> ist nachfolgende St\u00f6rungsgrafik abrufbar, die den Beginn und das Ende der St\u00f6rung vom 28.7.2022 zeigt.<\/p>\n<p><a href=\"https:\/\/web.archive.org\/web\/20211006144247\/https:\/\/xn--allestrungen-9ib.de\/stoerung\/webex\/\" target=\"_blank\" rel=\"noopener\"><img decoding=\"async\" title=\"Webex down\" src=\"https:\/\/i.imgur.com\/mujVnAJ.png\" alt=\"Webex down\" \/><\/a><\/p>\n<p>Auf Twitter <a href=\"https:\/\/twitter.com\/Webex\/status\/1552717092444467200\" target=\"_blank\" rel=\"noopener\">best\u00e4tigte<\/a> der Webex-Support vor zwei Stunden Probleme mit dem Dienst. Dort hei\u00dft es:<\/p>\n<blockquote><p>We're aware of issues affecting several Webex services and our engineering team is hard at work. We apologize for the inconvenience. Regular updates will be shared here as soon we as have them. We appreciate your patience.<\/p>\n<p>In addition, users may experience difficulties logging into Webex Control Hub and managing their Webex Cloud device. All hands are on deck to restore services and we apologize for the inconvenience this may cause.<\/p>\n<p>&nbsp;<\/p>\n<p>Update to service (1\/2) Engineering is investigating an issue which is causing connectivity issues with the Webex App. This may include logging in, sending messages and files, presence issues, and starting or joining meetings in the Webex App.<\/p><\/blockquote>\n<p>Aber die Techniker scheinen das Problem schnell in den Griff bekommen zu haben, denn vor 44 Minuten hie\u00df es:<\/p>\n<blockquote><p>Engineering is continuing to take remediation steps to restore services. We appreciate everyone's patience while we are all hands on deck to address the incident.<\/p>\n<p>For users who are still unable to log into the Webex App, we recommend that you close and re-launch the Webex App. We will continue to provide updates as they become available. Once again, we appreciate your patience today.<\/p><\/blockquote>\n<p>Und die letzte Meldung vor wenigen Minuten best\u00e4tigt, dass der Dienst wohl wieder weitgehend funktioniert.<\/p>\n<blockquote><p>The majority of services have been restored and are operational. Some users may still experience intermittent issues accessing their Webex board and devices.<\/p><\/blockquote>\n<p>Hier die aktuelle \u00dcbersicht (22:38 Uhr) der Cisco Webex Statusseite, die <a href=\"https:\/\/status.webex.com\/service\/status?lang=en_US\" target=\"_blank\" rel=\"noopener\">hier abrufbar<\/a> ist, auf der nur noch Restst\u00f6rungen zu sehen sind.<\/p>\n<p><a href=\"https:\/\/status.webex.com\/service\/status?lang=en_US\" target=\"_blank\" rel=\"noopener\"><img decoding=\"async\" title=\"Webex status\" src=\"https:\/\/i.imgur.com\/lssTwS4.png\" alt=\"Webex status\" \/><\/a><\/p>\n<p>Interessant ist in diesem Zusammenhang der nachfolgende <a href=\"https:\/\/twitter.com\/KUbhurleyKC\/status\/1552742463743434754\" target=\"_blank\" rel=\"noopener\">Tweet<\/a>, der auch von einem AWS-Problem an der Ostk\u00fcste der USA berichtet.<\/p>\n<p><a href=\"https:\/\/twitter.com\/KUbhurleyKC\/status\/1552742463743434754\" target=\"_blank\" rel=\"noopener\"><img decoding=\"async\" title=\"AWS outage\" src=\"https:\/\/i.imgur.com\/twk43Hx.png\" alt=\"AWS outage\" \/><\/a><\/p>\n<h2>Ursache des Ausfalls<\/h2>\n<p><strong>Erg\u00e4nzung:<\/strong> Blog-Leser Michael T. hat mir die Erkl\u00e4rung von Cisco\u00a0 zu den Gr\u00fcnden des Ausfalls per Mail zukommen lassen. Es gab wohl ein Problem in einem Rechenzentrum, was die Dienste beeintr\u00e4chtigte. Die Umleitung auf andere Zonen klappte dann aber nicht. Ich h\u00e4nge das einfach mal hier an:<\/p>\n<blockquote><p>Webex Services Incident<\/p>\n<p>Incident Number: INC0047261<br \/>\nIncident Duration: July 28, 2022, 16:55 \u2013 19:55 UTC<\/p>\n<p>Incident Details<br \/>\nAt 16:55 UTC on July 28 th , a service provider hosting some of the Webex services experienced a significant outage. Services for messaging,<br \/>\ndevices, authentication, and analytics were hosted in the affected service provider data center, which caused multiple microservices to<br \/>\nfail. Users were unable to authenticate to Webex, register or connect to meetings consistently from Webex devices, use messaging, or log into the Control Hub administration console. The Webex status page was also hosted in the same service provider data center, which caused delayed access to the status page.<\/p>\n<p>Root Cause<br \/>\nThe Webex engineering team attempted to redirect services outside of the affected environment; however, the redirects were not<br \/>\nsuccessful due to the outage affecting multiple redundant service zones. Due to a core component of the service architecture becoming<br \/>\nunavailable, the redirects to alternative zones were ineffective. Engineering was unable to successfully stop the services in the hosted<br \/>\ndatacenter due to connectivity failures. This caused services to remain unstable as device and software clients continued to send<br \/>\nconnection retries. The combination of incomplete service termination, the unhealthy data center, and the multiple client retries caused the connections to overload the available service capacity, and the capacity of the edge and microservices in the redundant environment was unable to process the traffic.<\/p>\n<p>Corrective Actions<br \/>\nEngineering worked with the service provider to restore the core service architecture and scaled up the edge and service capacity within<br \/>\nthe environment which allowed the clients to successfully connect, and the additional traffic generated by the retries to stop. This caused<br \/>\nservices to stabilize, which allowed engineering to take additional remediation steps, including restarting unhealthy instances and<br \/>\nadditional load balancing, leading to full service recovery. Engineering completed remaining service clean-up and load balancing, and services were fully recovered at 19:55 UTC.<br \/>\nThe service team has identified areas of improvement for our incident response time, including changes to the service metrics for faster root cause identification, improved runbooks to speed up scale-up and deployment of additional micro-services, and revisiting client retry values which will improve service restoration should a similar incident occur in the future. The messaging architecture team is also engaged to identify service architecture improvements which will allow Webex services to better withstand a multi-zone failure.<\/p>\n<p>Timeline<br \/>\n16:55: Webex alerts indicate multiple service failures; incident process began<br \/>\n17:15: Engineering began service redeploys<br \/>\n17:33: Services fully stopped in the affected DC<br \/>\n17:45: Service provider reported partial recovery<br \/>\n18:00: Metrics indicate high traffic volume due to client retries<br \/>\n18:30: Multiple scale-up efforts completed<br \/>\n19:10: Capacity increase across both pools completed; 80% increase in traffic observed. Service redeploys began<br \/>\n19:45: Vendor confirms services fully restored. Additional capacity deploys completed in all three zones<br \/>\n19:55: Final redeploys completed; services restored.<\/p><\/blockquote>\n","protected":false},"excerpt":{"rendered":"<p>[English]Blog-Leser Gerald hat mich gerade per E-Mail dar\u00fcber informiert (danke daf\u00fcr), dass der Videokonferenzdienst Cisco Webex gest\u00f6rt sei. Ich habe mal kurz geschaut, die St\u00f6rung begann um ca. 19:00 Uhr deutscher Zeit und dauerte bei ca. 22:00 Uhr. Der Webex &hellip; <a href=\"https:\/\/borncity.com\/blog\/2022\/07\/28\/cisco-webex-down-28-7-2022\/\">Weiterlesen <span class=\"meta-nav\">&rarr;<\/span><\/a><\/p>\n","protected":false},"author":2,"featured_media":0,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[7862],"tags":[987],"class_list":["post-271010","post","type-post","status-publish","format-standard","hentry","category-stoerung","tag-storung"],"_links":{"self":[{"href":"https:\/\/borncity.com\/blog\/wp-json\/wp\/v2\/posts\/271010","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/borncity.com\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/borncity.com\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/borncity.com\/blog\/wp-json\/wp\/v2\/users\/2"}],"replies":[{"embeddable":true,"href":"https:\/\/borncity.com\/blog\/wp-json\/wp\/v2\/comments?post=271010"}],"version-history":[{"count":0,"href":"https:\/\/borncity.com\/blog\/wp-json\/wp\/v2\/posts\/271010\/revisions"}],"wp:attachment":[{"href":"https:\/\/borncity.com\/blog\/wp-json\/wp\/v2\/media?parent=271010"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/borncity.com\/blog\/wp-json\/wp\/v2\/categories?post=271010"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/borncity.com\/blog\/wp-json\/wp\/v2\/tags?post=271010"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}