Yesterday's Microsoft 365 Suite-wide outages, led to continual faults for Outlook on the web on…
Early Detection of Microsoft 365 and Teams Outage
Is there an Office 365 outage today? Is Microsoft Teams down? Are MS365 services down?
If that’s what you are thinking, then keep reading…
Exoprise CloudReady provides early detection of mission-critical mail outages. On March 15, Microsoft had a service outage worldwide that impacted its services such as Teams AV, Yammer, OneDrive, and Azure Active Directory. As a result, end-users reported not being able to login into either of these services and were getting timeout messages. Services were down!
But check this out.
Exoprise detected the issue earlier at 3 pm EST (40 mins before Microsoft reported it). In addition, we were able to immediately relay the news to our customer base.
Users may be unable to access multiple Microsoft 365 services
The following Microsoft Service Communication Message was received at Mon, 15 Mar 2021 19:40:05 +0000
Title: Users may be unable to access multiple Microsoft 365 services
WorkloadDisplayName: Microsoft 365 suite
StartTime: Mon, 15 Mar 2021 19:34:22 +0000
ImpactDescription: Users may be unable to access multiple Microsoft 365 services.
LastUpdatedTime: Mon, 15 Mar 2021 19:40:05 +0000
Mon, 15 Mar 2021 19:39:14 +0000
Title: Users may be unable to access multiple Microsoft 365 services User Impact: Users may be unable to access multiple Microsoft 365 services. More info: Initial reports indicate that primary impact is to Microsoft Teams; however, other services including
Exchange Online and Yammer are also impacted. Current status: We’re investigating a potential issue and checking for impact to your organization. We’ll provide an update within 30 minutes.
Exoprise Microsoft 365 Dashboard and Notice for Faster Troubleshooting
Here is an example of how Exoprise is able to proactively capture outages and provide complete coverage. You don’t need to visit Twitter to get outage status. Integrated tweets in real-time help customers get updates, outage news, and stay informed of the latest developments by Microsoft. You can see if the outage is global or local in our Office 365 integrated dashboard. The Office 365 outage map shows services affected globally in parts of India and North-East America.
Monitoring Microsoft Teams Audio Video Outage
You can see below in the colored pink section that Microsoft Teams is unavailable or unresponsive. In other words, it’s the onset of the outage.
Latest Updates on Office 365 Outage
Title: Users may be unable to access multiple Microsoft 365 services
User Impact: Users may be unable to access multiple Microsoft 365 services.
More info: Any service that leverages Azure Active Directory (AAD) may be affected. This includes but is not limited to Microsoft Teams, Forms, Exchange Online, Intune and Yammer. Admins may also be unable to access the Service Health Dashboard.
Current status: We’ve identified the underlying cause of the problem and deployed an update to resolve the issue. The update has finished its deployment to all impacted regions. Microsoft 365 services continue the process of recovery and are showing decreasing error rates in telemetry. We’ll continue to monitor service health as availability is restored.
Scope of impact: This issue could affect any user. Next update by: Monday, March 15, 2021, 7:00 PM (11:00 PM UTC)
Preliminary Root Cause Analysis of the Microsoft 365 Outage M0244568
Microsoft recently updated the root cause for this outage and its to do with
- Ongoing, enhanced security protection with Azure AD
- Rotation of security keys
This is an excellent goal to pursue but, obviously, getting there can be a challenge. Read on for more insight into the cause and more detail can be found here: https://status.azure.com/en-us/status/history/
Preliminary RCA – Authentication errors across multiple Microsoft services (Tracking ID LN01-P8Z)
Summary of Impact: Starting approximately 19:00 UTC on March 15, 2021 customers may have encountered errors performing authentication operations for any Microsoft and third-party applications that depend on Azure Active Directory (Azure AD) for authentication.
The Azure Portal, Microsoft Teams, Exchange, Azure Key Vault, SharePoint, and other applications have recovered. Other applications are in the process of recovering and impacted customers will continue to receive updates regarding these.
Preliminary Root Cause: The preliminary analysis of this incident shows that an error occurred in the rotation of keys used to support Azure AD’s use of OpenID, and other, Identity standard protocols or cryptographic signing operations. As part of standard security hygiene, an automated system, on a time-based schedule, removes keys that are no longer in use.
Over the last few weeks, a particular key was marked as “retain” for longer than normal to support a complex cross-cloud migration. This exposed a bug where the automation incorrectly ignored that “retain” state, leading it to remove that particular key.
Metadata about the signing keys is published by Azure AD to a global location in line with Internet Identity standard protocols. Once the public metadata was changed at 19:00 UTC, applications using these protocols with Azure AD began to pick up the new metadata and stopped trusting tokens/assertions signed with the key that was removed. At that point, end-users were no longer able to access those applications.
Next Steps: We understand how incredibly impactful and unacceptable this is and apologize deeply. We are continuously taking steps to improve the Microsoft Azure Platform and our processes to help ensure such incidents do not occur in the future.
Microsoft 365 Twitter Status Account Feed
The issue was related to a recent change in the authentication system.
Office 365 Outage History
The best part of working with us?
We have successfully detected several outages in the past
- Exchange Online mail delivery outage on 3rd Feb 2021
- Office 365 outage on 27th Jan 2021
- Microsoft 365 Teams Outage on 19th Aug 2020
Check out our Office 365 Outage Detection Page to learn more.
Watch How to Monitor Microsoft 365 Services and Get Outage Notifications
Watch CloudReady in action and see how it can help your team proactively monitor Teams service.
You can see how we monitor other applications as well (Outlook, Exchange, Sharepoint, OneDrive, Azure AD, etc.)
When there is an Office 365 Outlook or Exchange outage, businesses will suffer from downtime. But we make it easy for you to detect them and get service updates.
Notifying and communicating with your customers and employees early is critical when Microsoft 35 outage strikes. Check out our blog we wrote on How to Check Office 365 Service Health in CloudReady
Microsoft 365 Outage Report, Service Continuity, and Availability Metrics
Our sensors capture real-time service health status, availability, usage, and uptime statistics.
Simply head over to the Availability tab and visualize real-time outage status for Microsoft Teams or any app.
Use this report metrics to recover service-level agreements (SLA) credits from Microsoft or your service provider.
Office 365 Outages and ServiceNow Integration
Do you have ServiceNow or any other ITSM in your organization?
Exoprise CloudReady makes it easy to view all application alerts, notifications, and outages in a single system. How?
That is done via WebHooks and API connection.
Read our blog on managing Office 365 outages in ServiceNow for setup and configuration details.
Free 15 Day Trial for Monitoring Microsoft 365 Outages and Service Health
You need uptime, no downtime. Unplanned M365 outage incidents, service disruptions, and network connectivity issues put a dent in business productivity.
If you had Exoprise CloudReady earlier today, you’d have known about the outage hours in advance. Not only that, you could have communicated it to your users who might be waiting on that business-critical email. You need to witness detailed metrics to get a better hold of an outage so you can troubleshoot quickly and also recover Microsoft service credits.
Other vendors simply blog about the outage from Microsoft’s portal and service health messages. They fail to show how they actually captured the error and outage.
Ensure business continuity and end-user digital experience by monitoring Microsoft 365 real-time service health with Exoprise monitoring solution. Only Exoprise shows how it captures the errors in advance of Microsoft reporting the problem.