The up.time IT Systems Management Blog

Posts Tagged ‘Business Service Monitoring’

Travel Woes and Root Cause Analysis

Tuesday, February 5th, 2013

So who likes to travel?  If we were to play Family Feud and name something that people like to do, I’m certain “travel” would be one of the top answers.  But how many of us like waiting to board a plane? How many of us like delays and spend unnecessary hours at the airport?  Right.  I didn’t think so.  So when I read in the news about the Toronto Pearson International Airport having a computer outage that led to significant delays, I can imagine how frustrating it would have been.  Being a Torontonian, I know our airport does not hold the title of being the busiest airport in the world.  Nonetheless, it was ranked (albeit 38th) in 2011.  Part of the news article said “technicians are not sure what caused the problem”, which is a scary thought.  Unfortunately, this is not an isolated incident that only happens at Nav Canada; the company behind traffic control at the airport.

Root cause analysis is one of the holy grails in IT management.  If you are a system administrator who can pinpoint exactly why an outage happens, not only will you look like a superstar, your users/customers will love you for it.  How can you achieve that with up.time?  First of all, you need to have a unified dashboard so you can see things as they happen.  But just as important,  is getting alerts to the right person at the right time.  But once you get the alert, what’s next? You need to be able to monitor complex business services.

There are two key points to consider:

  1. First, you must have coverage for all the underlying components that make up your business services.  Whether it is OS, applications, or network and network devices, you must have visibility to everything in your infrastructure.
  2. Second, you need to be able to tie all the different components into your business services so that you can see the overall health of your services and exactly which component(s) is down.

The latter is vital if you want to perform root cause analysis. Having a tool (like up.time) that facilitates root cause analysis will make you the superstar (that you are), save you time in troubleshooting issues in your environment and get to the root cause of any outages with ease!  If you haven’t tried out up.time in your environment, you need to download it and take it for a spin!

The SLA Dashboard – The Key to Measuring Business Services

Wednesday, August 22nd, 2012

SLA MonitoringLast time we discussed how you can instantly identify the impact of your IT infrastructure on business services.  Taking that one step further, let’s talk about how you can measure your business services.  There are many metrics one can use to determine the performance of their business services.  Let’s say you want to measure the throughput of your email system.  up.time has Service Monitors for that.  Or let’s say if you want to determine the number of requests your web server gets, there’s a Service Monitor for that.  These are important metrics, but they are not necessarily essential.

The one vital metric is how often the business services are up.  And as you already know, a business service is not a single process on a single computer.  Whether the service is available or not depends on a number of components.  How do you tie all the components together and have a single number representing the health of your business service?  One way to gauge is by using the Service Level Agreement (SLA) dashboard in up.time.

uptime software SLA Dashboard

SLA Dashboard

Why should you use the SLA dashboard?  If you have SLA requirements, it’s a no brainer to use this dashboard because it provides a real-time status of your SLA. It also lets you know if you are on thin ice, telling you exactly how many more minutes you have until your SLA will be breached.  But what if you don’t have any SLA requirements? You should still setup SLA’s for each of your business services because up.time will be able to quantify how well you are managing your IT infrastructure.  up.time can determine the percentage of time when a service is available given the compliance period.  If you are system administrator, your performance is tied directly to whether services are accessible to the users.  That bonus you wanted?  You will have a much stronger argument why you deserve it if you can show a concrete KPI of how well you are doing.  Or if you are a CIO, having a magical number summarizing what you need to know will enable you to work more efficiently.

Whether you have formal, informal or non-existent SLA on your business service, to get a complete view of your IT infrastructure, the up.time SLA dashboard is a must-have!  Download up.time and take it for spin!

-Patrick

2010 The year of cloud enabled convergence

Tuesday, January 5th, 2010

This is my thesis for today’s post: Geek toys are important for the future of digital convergence.

2010 will be a year where we will obviously see unprecedented leaps in the availability of geek toys. As you are all aware, CES is happening and a few well timed launches are expected.  The general themes are extremely clear, thanks to a few “leaks” to the press this year.  Consumers are expecting a huge explosion of  devices in ‘tablet form’ as well as a dollop of mobile computing devices based on the Android mobile application ecosystem.  In essence we are all expecting 2010 to be full of ultra powerful, low power, beautifully designed tablet ‘like’ devices that look like they came off the latest set of a Star Trek episode. All of these juicy play things will be delivering waves of toy induced Geek euphoria among the masses for months to come. Will I be partaking in this geek fest? Absolutely, I’ll be one of the early adopters rocking a Nexus One, but that’s not really the point of my post.

From a consumer standpoint, the entire internet and our entire digital lives are converging into devices like the Nexus One and the Apple tablet. That’s amazing when you consider that these devices are essentially a “piece of glass” with a wireless interface, a processor, some kind of solid state memory and a camera. This has been enabled by huge leaps in battery technology, low power computing, but more importantly the richness the “cloud” or essentially what the internet has to offer us on these new types of devices.

The contrast is that, from an IT systems management perspective, the stack used to deliver business services, and ultimately, the content and services to these endpoints gets exponentially more complex and layered with every iteration in the design of the devices. The iterations are also getting faster, as the race to conquer this wild west arena heats among all the usual suspects.

So, this is going to be great for consumers. We are going to see an explosion of different operating system variants, hardware paradigms, and new ways of consuming media. The question becomes, how many IT decision makers are already wondering, what will the impact of people wanting to rock an “ISlate” at work be? What will be the impact of having to provide more and more business services over the wire to mobile platforms like Android, Apple’s mobile tablet OS, Chrome on the Google Tablet (and the list will go on and on for 2010) be? What will be the business impact of having to monitor all the new infrastructure or SAAS based services needed to manage these devices from a corporate policy perspective? How about even the basics of trying to monitor the explosion of different kinds of endpoints themselves as they penetrate the enterprise? We all remember that the IPhone was initially a consumer only device, that later penetrated the enterprise with impunity. Most of my posts end with the same question – are you ready?

Monitoring Applications and Business Services from an iPhone or Mobile Device

Wednesday, May 20th, 2009

For the past couple of weeks I have been working with a large customer who needs immediate visibility into critical Business Services they are providing for multiple departments. The common complaint is that when people start saying “Application XYZ is down, or XYZ is really slow today” they don’t know where to start looking. This is actually a very common requirement for most of the prospects/customers I work with on a daily basis. This is why Application monitoring from a mobile device is extremely useful.

Luckily up.time 5 has a great solution that maps to this exact problem set. Application Status allows you to see the overall health of specific business services within your enterprise. From E-mail Infrastructure, Web Site Availability, Virtual Infrastructure Status and more you can rapidly see why a Business Service is having issues, drill down into deep forensics and improve MTTR within your organization.

You can access the Application Status real time dashboard from your PC, Blackberry® and iPhone® in both the standard or detailed view (Standard view shown below)

 

[/caption]

Detailed reporting is also available showing the uptime/downtime for your specific applications/business services.

Application Availability Report

Application and Business Services Dashboards shown on the iPhone

 

This customer was actually able to try out this feature yesterday when users were complaining that they were not able to connect to their VMware vCenter via the Virtual Infrastructure Client. When they went to the applications view they immediately saw that the vCenter Database was offline. Going forward they have setup alerting and self healing profiles that would automatically notify the appropriate people as well as re-start the SQL service.

If you would like to check out the Application/Business Service Monitoring solution with up.time 5 please click on the link below:

Application Monitoring and Business Service Monitoring with up.time 5