As previously detailed on the Exoprise blog, the ICMP (Internet Control Message Protocol) is crucial…
When organizations adopt SaaS, their IT teams often struggle to figure out how to effectively monitor and manage their cloud apps to ensure users can effectively access and use them. It’s not uncommon for IT teams to be divided into separate groups of specialists that focus on specific parts of the IT infrastructure. Network admins manage networking infrastructure and services like Active Directory. Server or datacenter admins manage the server room infrastructure and the server virtualization environment. There may also be another team that manages the master set of system images.
Then there are the application admins. They manage the configuration, deployment, monitoring and maintenance of one or more apps. They often get their physical/virtual servers from the datacenter admins and they rely on the network admins for any network related issues. This allows them to focus on the specific management tasks and interfaces for the app(s) they support.
Larger organizations may also have a dedicated Network Operations Center (NOC) and/or monitoring team dedicated to continuous infrastructure monitoring with dedicated tools and dashboards, as well as a help desk which handles level 1 user support and triage. This specialization helps teams manage the growing complexity of IT, but it also has downsides. Two problems in particular are exacerbated when the apps are consumed as SaaS from public cloud vendors.
Problem #1: Reactive v. Proactive
The first problem is that this model tends to be a reactive. Nothing happens until a) a user calls/emails the help desk or b) something logs an error that is picked up via NT Event Log, SNMP, etc.. This isn’t ideal, but often IT teams can get away with this when everything is local and they can reset or reboot the problem system reasonably quickly.
However, when you move to the cloud, everything isn’t local. In fact, most of the infrastructure you rely on is outside your network. No error logs, no SNMP messages…and even if your ISP or application provider does send you alerts, it won’t happen right away.
It’s more likely you’ll get flooded with user complaints first, then have to scramble to “hunt” for the problem while your users fill your inbox (unless it’s an email outage, in which case they may visit you in person! 😉 )
Problem #2: Isolated v. Comprehensive
The second problem is that these groups within IT can become the proverbial “blind men describing the elephant.” The application, datacenter, and network ops teams all focus on their silos and nobody has a handle on the big picture. The help desk escalates to the application admin, who escalates to datacenter and/or network ops teams if he/she doesn’t see any errors coming out of the app server itself.
If you’re lucky, one of these teams will find the infrastructure component causing the problem. If not, you may find yourself bogged down in ghost chasing and finger pointing. The infrastructure teams don’t know much about the apps and the app teams don’t know much about the infrastructure, and both may assume it’s not their problem.
Again, this problem gets worse with cloud since both the infrastructure and the application tier span multiple networks and providers. SaaS delivery architecture is complex, and a siloed approach all but guarantees you’ll be caught off-guard by your next service outage or performance degradation.
You need to monitor Service Health
As an application admin, you need to take a different approach with your SaaS apps, recognizing that they are integrated services, relying on a variety of networks, supporting services, and providers. This means proactively testing the service as a whole and monitoring a variety of metrics that can give you warning of problems before they become catastrophic.
Service health monitoring starts with identification of the key application dependencies inside and outside your network. In addition to your application/internet service providers and local infrastructure, do you rely on a 3rd party single sign-on (SSO) solution like Okta? What DNS entries are critical to your app? Is there a “home datacenter” for your application tenant? It’s a good idea to diagram this service delivery chain to help you identify potential points of failure.
Next you should think about user access and experience. Where are your users accessing these apps from? That’s where you need to monitor from. What are the most critical functions of these apps? That’s what you need to monitor. What are the acceptable response times for these functions? This last question can be tough to answer in the beginning, but take a SWAG and adjust as you go. This is a key metric for the next step.
Since it’s not possible to directly monitor all of these delivery components, we believe you need to be testing your SaaS apps continuously, using the same user interfaces and API’s your users do. In doing so you ensure that you are exercising the entire service and can track its health. This is where those performance goals come into play. While some application problems will result in a sudden and complete outage, in most cases you’ll see application performance shift to unusable levels well before the service is completely offline.
By comparing current performance against expected and historical trends you can detect when applications are starting to get sick. Ideally you want to also compare local performance at your locations with other customers so you can see how your environment stacks up (hint: Exoprise CloudReady® enables you to do this).
Finally, you should also be testing those application dependencies you identified in the first step. Monitoring server related services, checking DNS entries, monitoring the end-to-end network path health, and exercising your ADFS/SSO service can provide earlier detection of issues that will affect your users. You may not be the network admin, but with SaaS you are better off monitoring some network services directly so you minimize the time it takes to resolve problems that do occur.
It can be if you don’t have the right tools. We think SaaS is clearly the better software consumption model for most standard business applications, but it’s definitely no free lunch for IT. You still own the application as far as your users are concerned, on prem or in the cloud. Your job remains the same. Only the tools and techniques change. The good news is that with CloudReady you now have a solution that enables you to monitor service health for all of your SaaS apps easily and effectively.