Does the Azure SLA always apply?
Microsoft Azure has a financially backed Service Level Agreement (SLA) where they return some of the monthly spend if they do not meet what they promises (for example a certain uptime percentage).
This is good, but there are certain circumstances when the SLA does not apply.
This blog post will explore when the SLA does not apply and then offer up a simplified version of the exclusions.
According to Microsoft’s documentation; SLA and any applicable Service Levels do not apply to any performance or availability issues under the following circumstances:
- Due to factors outside our reasonable control (for example, natural disaster, war, acts of terrorism, riots, government action, or a network or device failure external to our data centers, including at your site or between your site and our data center).
- That result from the use of services, hardware, or software not provided by us, including, but not limited to, issues resulting from inadequate bandwidth or related to third-party software or services.
- That results from failures in a single Microsoft Datacenter location, when your network connectivity is explicitly dependent on that location in a non-geo-resilient manner.
- Caused by your use of a Service after we advised you to modify your use of the Service, if you did not modify your use as advised.
- During or with respect to preview, pre-release, beta or trial versions of a Service, feature or software (as determined by us) or to purchases made using Microsoft subscription credits.
- That result from your unauthorized action or lack of action when required, or from your employees, agents, contractors, or vendors, or anyone gaining access to our network by means of your passwords or equipment, or otherwise resulting from your failure to follow appropriate security practices.
- That result from your failure to adhere to any required configurations, use supported platforms, follow any policies for acceptable use, or your use of the Service in a manner inconsistent with the features and functionality of the Service (for example, attempts to perform operations that are not supported) or inconsistent with our published guidance.
- That result from faulty input, instructions, or arguments (for example, requests to access files that do not exist).
- That result from your attempts to perform operations that exceed prescribed quotas or that resulted from our throttling of suspected abusive behavior.
- Due to your use of Service features that are outside of associated Support Windows.
- For licenses reserved, but not paid for, at the time of the Incident.
- Your initiated operations such as restart, stop, start, failover, scale compute, and scale storage that incur downtime are excluded from the uptime calculation.
- Monthly maintenance window that incurs a downtime to patch your server and infrastructure is excluded from the uptime calculation.
To simplify, the SLA does not cover:
- Problems caused by natural disasters, war, terrorism, riots, government action.
- Problems with hardware or software not provided by Azure.
- Problems when all your network traffic is routed through a single Microsoft datacenter.
- Problems where you are running a configuration that is not recommended (for example an old version of .NET, use of discontinued service, etc.).
- Problems with preview services (meaning, services that are not GA or Generally Available).
- Problems caused by someone from your organization or someone that is connected to your organization (…shooting yourself in the foot; making a mistake). This also includes problems that are connected to poor security practices (for example, disabling MFA and giving everyone “Password123” as their password).
- Problems caused by using a service in a way it was not designed to be used and inconsistent with published guidance.
- Problems caused by user error.
- Problems caused by hitting quotas/thresholds. For example if you have a P10 Premium SSD (supports 100 MB/sec throughput) and are constantly running it at 100 MB/sec; you will likely experience problems as you are pushing the service to the limit.
- Problems associated with use of unsupported services. This could for example be if you are running a custom/unsupported operating system.
- Problems associated with services that you have not paid for (if you don’t pay, you can’t get anything back!).
- Problems where the customer caused the downtime, through operations such as stop, start, resize, etc.
- Problems that occur when Azure needs to patch. You could experience downtime when your services are getting patched and Microsoft is not going to compensate you for that downtime. If this potential downtime is concerning, you can architect the solution in such a way that the application does not get impacted by patching.