The up.time IT Systems Management Blog

Posts Tagged ‘ITSM’

What Everybody Ought to Know about Capacity Planning

Thursday, January 17th, 2013

Capacity Planning has been a “hot” topic of discussion amongst the IT community for a while now. We’ve gone through this topic in length on this blog, whitepapers and webinars (see related resources below). In our most recent blog post about IT capacity demands during peak seasonal periods, we established that correct capacity management is a gold mine; mis-managed capacity is the Titanic.

 

Is Capacity Planning Important to my datacenter?

Capacity PlanningCapacity Planning is essential in any IT data center to ensure the performance and availability of IT services and applications. Companies cannot succeed without the IT infrastructure they depend on so it is critical to balance capacity needs, while keeping costs in line.

Successful Capacity Management requires a unified view of your IT Environment. Far too many companies out there spend too much on additional capacity; meanwhile, they have the internal capacity readily available. This is due to lack of visibility into the critical capacity information needed to make the right decisions. This is particularly a problem in mixed vendor environments, multiple platforms and multiple data centers.

 

How do you get complete capacity visibility and reporting to implement effective capacity planning?

Effective capacity management starts with these basics:

  • Capacity PlanningGet accurate and granular capacity insight across all platforms and infrastructure (right down to the bare metal), so you can make high-level decisions and become more proactive
  • Find under-utilized capacity and re-allocate it to where it’s needed
  • Slow down spending on new equipment until existing servers are operating at over 60% and VMs are at a minimum of 90%
  • Get visibility and reporting that can monitor, measure and report on global capacity
  • Collect deep historical data in easy-to-view reports and charts to trend data and proactively
    forecast future capacity needs

 

 

If you are looking for a capacity management and reporting solution, the right tool should answer these critical questions:

  1. How much total capacity do we have?
  2. How much capacity are we currently using?
  3. When and where are we going to run out of capacity next

 

Ultimately, the right ITSM tool will need to empower IT to optimize performance, reliability and efficiency of the IT services it delivers to the business, and its clients, at the best cost. For IT departments with both physical and virtual infrastructures, capacity management will mean the difference between becoming the organization’s gold mine or it’s Titanic. Which do you want to be?

_____

Additional Resources:

Related Blog Posts

Capacity Planning: Do you Know your Virtualized Environment?

Key to Capacity Planning is Knowledge

The Do’s and Dont’s of Capacity Planning

Manage Capacity and Avoid Downtime During the Holiday Season

Related Whitepapers

The 6 Capacity Planning Essentials

Related Webinar

3 Simple Steps for Total Control of IT Capacity

4 Tips on Choosing a Unified IT Monitoring Solution that Fits Your Budget

Tuesday, January 8th, 2013

Before considering an IT Systems Management (ITSM) solution, it’s essential to understand what your team ‘needs’ rather than ‘wants’, which will trim down excess requirements and ultimately price. Furthermore, make sure the solution is a good fit for your budget, meets your needs, solves your problems and can scale to grow with your business.

Here are some tips and tricks you should keep in mind when evaluating an ITSM suite:

Licensing Costs

  • Per-element pricing (pay for servers/devices you want to monitor) provides most value. It’s easy to manage, scalable and cost-effective.
  • Look for licensing models that includes most, or all, of what you need in one price. For example, complete or unified solutions (without modules, application charges, management packs or upgrade reports) will provide all the functionality you need without the extra cost.

Deployment Costs

  • If buying a solution that requires deployment services, add 100% of the license cost for deployment.
  • The longer the time to deploy, the more complex the solution is to use. Find an ITSM suite that you can deploy quickly, with your own IT staff. This avoids large costs associated with services, margin for error, cost overruns, stressed out staff etc.

Maintenance Cost

  • Look for a maintenance model that fits your needs and be able to escalate to higher tiers of support without incurring extra costs.
  • Product upgrades should be free and annual fees should never be more than 20% of your licensee cost.
  • You shouldn’t need a full-time employee to maintain your ITSM suite – it should make your team more productive, not require additional staff.

Value to the Team

  • Provide value to the entire IT team across silos (VMware, Windows, UNIX, Application, and Network teams).
  • Create high performing IT teams by having everyone on the same page and working together by using an ITSM dashboard that can be used as a heads up display across multiple silos. This greatly reduces the costs associated with buying and maintaining multiple tool stacks.
  • Find a Unified Monitoring Tool that can satisfy the core needs of each silo and the cost savings will be substantial.

Lastly, consider an easy-to-use solution, as it will greatly assist with day-to-day use. This will encompass quick deployment, simple configuration, dashboard to root-cause analysis in seconds, auto-discovery of systems, proactive alerting and 3-click intelligent reporting. Enabling IT with an intuitive ITSM suite that will get the job done will make IT more proactive in meeting SLAs, as well as helping solve immediate problems quickly and easily.

How to Avoid a “Sea of Red” – False Alerting in your IT Infrastructure

Thursday, October 25th, 2012

I like calling a flood of alerts a sea of red. Hearing that term immediately brings to mind an image of someone drowning and crying out for help. Either that, or someone getting all grumpy when the alerts come.

When evaluating a monitoring solution, don’t just look at the metrics you are able to collect because getting metrics is the easy part. Anyone can write scripts and/or programs to get the numbers they want. Getting useful alerts is much more difficult and useful when you want to get a handle on your IT infrastructure. So what are useful alerts? Let’s look at car alarm as an example. Have you ever heard an alarm going off when a bass-bumping car drives by? That’s one false alarm I don’t need. Similarly, if your monitoring solution is sending you false alerts, what’s the point of having a monitoring solution? When you get enough of these false alerts, you will start filtering them and not read them altogether. Sadly, I have seen system administrators that do that on a regularly basis. They manage hundreds of servers and the monitoring solution sends out hundreds of alerts per day on false positives.

So how can you avoid a sea of red?

There are a number of features in up.time that minimize alert noise in your environment. Let’s focus on two in this post:

  1. Flexible Alert Settings
    The genius who designed my condo put a smoke detector on the ceiling inches away from my bathroom door. So after I take a nice, long hot shower, if I forget to close the door, the steam trips the smoke detector every single time. If the smoke detector would just give it 1-2 minutes of leeway for the steam to dissipate, the firemen wouldn’t be knocking at my door when I have a towel wrapped around my waist. Similarly, how many times do your servers periodically spike in CPU or memory usage which in turn trips the alert thresholds? Those are probably alerts that you can do without. So what does up.time do differently? up.time allows you to configure rechecks for any given monitor. For example, if the CPU usage passed the threshold, depending on the configuration, up.time doesn’t send out the alert right away. The user can choose to have up.time recheck a few times to verify if CPU usage is still high. If, and only if, the usage remains high, then an alert is sent out. How many times to recheck and how often is entirely up to you.
  2.  

  3. Topological Dependencies
    If you can’t make a call on your cell phone, what is the first thing you would check?  Of course you would check if you have reception or not.  Whether you can make a call or not depends on it.  Similarly, in a networked environment, devices and servers depend on a number of things in order for them to work cohesively.  The simplest example is a server depending on a switch for network connectivity.  In a lot of monitoring solutions, if the switch is down, that would result in a sea of red because the monitoring solutions can’t reach any of the servers.  This is especially true in silo-ed solutions where you have one piece monitoring the network, another monitoring the servers, yet another one monitoring the applications.  With up.time’s unified systems monitoring, you can setup the dependencies so it won’t swarm you with alerts when outages happen.  Instead of receiving all those server alerts, up.time will just send you an alert on the switch so you can focus on fixing what’s important.

If you are one of the unfortunate sysadmins buried by alerts, it’s a choice whether you want to continue to work like that. You can choose to weed through hundreds of alerts each day or you can kiss the sea of red goodbye! Work smarter! Try up.time and see the difference!

The Top 4 Reasons to Use a Unified IT Management Tool

Friday, September 7th, 2012

Yes, it looks ridiculous trying to hold onto all those limes. No one would do that at a supermarket when he/she can easily carry them with a bag or a basket. However, some of us might not realize that we are practically doing the same thing in other aspects of our lives. For example, according to a survey, the average US household has customer relationships with six different financial organizations. Whatever the reasons are, there are probably more benefits to consolidating and managing one’s finances with just one institution. Similarly, a lot of people are using multiple tools to manage and monitor their datacenter.

Here are the top 4 reasons why you should consider consolidating and having a single pane of glass:

 

  1. Gentle Learning Curve
    One of the primary reasons why people have so many tools to monitor their datacenter is because they have point tools to perform specific tasks in their environment. For example, they might have tools to monitor just their network. Or they might have tools just to monitor their databases. Each time a new technology is integrated, there is a new tool the administrators ultimately have to learn. This is a very inefficient way to work. A better alternative is to have a consolidated, extensible framework that provides admins coverage for any technology that might go into the environment in the future. They won’t have to learn a new interface and therefore saving the company time and money.
  2.  

  3. Eliminate Duplication of Configuration
    Despite the fact that point tools are made to solve specific problems, there are always areas where some overlapping will occur. For instance, you might have one tool to monitor your virtual environment as a whole, and another to monitor your OSs. In each tool, you’ll probably want to configure some sort of topology that makes sense to you. Not only do you have to configure the same hierarchy twice for each different tool, but if anything changes in your environment, you’ll have to make sure both tools are synchronized. And since each tool most likely has its own database, you might find yourself having to backup data for each of your point tools.  Another example is configuring alerts. Each tool most likely has its own alerting engine and alerting configurations. Managing who gets the alerts and when will be a headache jumping through all the different consoles.  All these issues stem from having different tools for monitoring different aspects in a datacenter. There are options available to minimize the effort. Why not utilize them?
  4.  

  5. Easily Correlate Events
    When we were kids, we only had a view of  our immediate surroundings; namely people and places. But as we grow up, we understand the world is a lot bigger than just our home, neighborhood and school. Similarly, point tools give you specific views into certain aspects of your IT infrastructure. However, if you only look through the lenses of these tools, you will miss out on things that are happening in other parts of your datacenter. To illustrate, if you have a tool to monitor your webserver, it might do a really good job of telling you when your website is slow or down. It might even be able to identify why your webserver is slow, as long as the cause is within your webserver. Keyword there being within. What happens if the webserver is not at fault? What if the reason why the webserver is unresponsive is because the operating system has an unrelated process taking up most resources? Or what if your webserver is virtualized and other virtual machines are contending for resources at the same time? Without a complete view of your environment, it will be very difficult to correlate these events.
  6.  

  7. Reduce Analysis Time
    The whole point of monitoring your IT infrastructure is to know about problems when they happen and to resolve issues as soon as possible. If you have multiple tools sending you alerts at the same time, you really have to be on the ball and be able to quickly decipher which console you should jump onto. Not only that, you might need to go through all the different consoles to figure out the root cause of the issue. Can you imagine if a surgeon walks into the emergency room and is given a huge bag of tools to dig through just to find the right one to use? I wouldn’t want to be the patient on that operation table! Similarly, if you are fighting fire in your production environment, do you really want to fiddle with a number of consoles and SSH/Remote Desktop windows?

 

If you currently use a myriad of monitoring tools in your IT infrastructure, there is no better time to look for better options. However, just be aware of how some vendors might discount the benefits of having a single pane of glass. Also, there are vendors that offer a suite of products that seemingly do everything, but these products are siloed and don’t integrate together to give you the unified view, which is critical to the success of a datacenter. If you have not been following the series of up.time’s complete IT dashboard, definitely take a look and see how up.time can help you better manage and monitor your datacenter!

–Patrick

Introducing the New Dashboarding API – Sneak Peek Pt. 1

Wednesday, August 29th, 2012

The next version of up.time will make it dramatically easier to share up.time information with other applications in your datacenter. Why is this important? IT Systems Monitoring is a small piece of the IT puzzle. To gain the most value from performance and availability information, you’ll need to be able to share information between systems and team members. Getting the right information to the right people or systems, at the right time, and in the right format is essential for building an effective monitoring platform. up.time has always been a great tool for sharing information, but has not been easy to extract information in a customized way to fit perfectly with your existing applications.

Enter the up.time API.

Over the next few up.time releases, we will be providing a complete RESTful API that will allow you to pull up.time data into other tools like dashboards, corporate portals, mobile apps, or any existing application. We are also planning to allow you to control the configuration of up.time from other applications. For example, you will be able to add and remove systems, adjust maintenance, and acknowledge service outages in an automated way, directly from other tools.

Take a look at a quick example below. Say I wanted a web page to display a chart showing the distribution of elements I’m monitoring by type.

OS Summary via Google Chart

up.time Data Displayed in a Google Bar Chart

Let’s start by listing out the elements that we’ve added to up.time using the GET command below:

GET https://win-dleith/api/v1/elements

This results in a json structure being returned with basic information about each element:

[
   {
      "description": "Default self-monitoring host",
      "groupId": 1,
      "hostname": "localhost",
      "id": 1,
      "isMonitored": true,
      "monitors": [],
      "name": "win-dleith",
      "tags": [],
      "type": "Server",
      "typeName": "Server",
      "typeOs": "Windows 7/Server 2008 R2",
      "typeSubtype": "Windows",
      "typeSubtypeName": "Microsoft Windows"
   },
   { 
      "description": null,
      "groupId": 1,
      "hostname": "10.1.52.1",
      "id": 2,
      "isMonitored": true,
      "monitors": [], // hidden for ease
      "name": "10.1.52.1",
      "tags": [],
      "type": "NetworkDevice",
      "typeName": "Network Device",
      "typeOs": "Ethernet Routing Switch",
      "typeSubtype": "Switch",
      "typeSubtypeName": "Switch"
   }
]

In my case, I only have 2 elements added. If I wanted to reference a specific element, I would use the format GET https://win-dleith/api/v1/elements/1 for element ID 1.

Now let’s take this one step further. Here is a simple php page that will access the list of elements and summarize them by type. The result looks like this:

OS Summary

up.time Data Displayed in a php page

 

Here is the php source to produce this page using lib_curl to fetch the data:

<html>
<head><title>OS Summary</title></head>
<body>
<?php   
// set some defaults for access control, change these to your up.time specifics   
$apiHostname="win-dleith.rd.local";   
$apiPort="9997";   
$apiUsername="admin";   
$apiPassword="admin";
// specify what API end point we would like   
$apiRequest="/api/v1/elements";

// initialize our curl session   
$ch = curl_init();   
curl_setopt($ch, CURLOPT_URL,"https://".$apiHostname.":".$apiPort.$apiRequest);   
curl_setopt($ch, CURLOPT_HTTPAUTH, CURLAUTH_BASIC);   
curl_setopt($ch, CURLOPT_USERPWD, $apiUsername.":".$apiPassword);   
curl_setopt($ch, CURLOPT_RETURNTRANSFER, TRUE);

// fetch our list of elements   
$res = curl_exec($ch);   
if (curl_error($ch))   
{   
   die( "Error Fetching Data => ".curl_error($ch) );   
}   
curl_close($ch);

// display elements using custom format   
$j = json_decode ( $res );
$list = array( );   
foreach( $j as $k )   
{     
   $list["$k->hostname"] = $k->typeSubtype;   
}
$summary = array_count_values($list);
?>
<h1>Operating System Summary</h1>
<table>
<tr><th>OS Type</th><th>Count</th></tr>
<? 
foreach ( $summary as $k => $v ) 
{   
   print "\n<tr><td>".$k."</td><td>".$v."</td></tr>"; 
}
?>
</table></body></html>

This is just one quick example of harvesting up.time data and displaying it in a customized way. Using up.time’s API, you can completely manage and manipulate your data output anyway you want. Earlier in this blog I displayed the same info using a Google Charts Bar Chart, here is the javascript source that produced that example based on data that had already been fetched:

<html><head>
<script type="text/javascript" src="https://www.google.com/jsapi"></script>
<script type="text/javascript">
google.load("visualization", "1", {packages:["corechart"]});
google.setOnLoadCallback(drawChart);
function drawChart()
{
   var data = google.visualization.arrayToDataTable(
      [['OS Type', 'Count'],
      ['Windows', 122 ],
      ['Switch', 1],
      ['Linux', 72],
      ['Solaris', 1],
      ['VcenterServer',1],
      ['Netware',2],['VcenterHostSystem',7]]
   );
var options = {title: 'OS Summary',hAxis: {title: 'OS Type', titleTextStyle: {color: 'red'}}};
var chart = new google.visualization.ColumnChart(document.getElementById('chart_div'));chart.draw(data, options);}
</script>
</head>
<body style="font-family: Arial;border: 0 none;">
 <div id="chart_div" style="width: 900px; height: 500px;"></div>
 </body>
</html>

The API provides similar functions for listing Service Monitors and Element Groups which I’ve listed below.

Monitors

GET https://win-dleith/api/v1/monitors
[
   {
      "description": "mysql running on port 3308",
      "elementId": 1,
      "id": 5,
      "isHidden": false,
      "isHostCheck": false,
      "isMonitored": true,
      "name": "Default up.time data store",
      "type": "MySQL (Basic Checks)"
   },
   {
      "description": "apache running on port 9999",
      "elementId": 1,
      "id": 6,
      "isHidden": false,
      "isHostCheck": false,
      "isMonitored": true,
      "name": "Default up.time web server",
      "type": "HTTP (Web Services)"
   },
   {
      "description": "monitors the available space on the local file systems",
      "elementId": 1,
      "id": 7,
      "isHidden": false,
      "isHostCheck": false,
      "isMonitored": true,
      "name": "Default File System Capacity",
      "type": "File System Capacity"
   },
   ...
]

Groups

GET https://win-dleith/api/v1/groups
[
   {
      "description": "",
      "elements": [], // hidden for ease
      "groupId": null,
      "id": 1,
      "monitors": [], // hidden for ease
      "name": "My Infrastructure"
   },
   {
      "description": "",
      "elements": [],
      "groupId": 1,
      "id": 2,
      "monitors": [],
      "name": "Discovered Virtual Machines"
   },
   {
      "description": "",
      "elements": [],
      "groupId": 1,
      "id": 3,
      "monitors": [],
      "name": "Discovered Hosts"
   }
]

Without hesitation, this API integration into up.time is a very exciting enhancement and will provide much more flexibility for our users. Look out for this new feature in the coming months, and if you haven’t downloaded up.time yet, try it out for free today!

– Dave

From Dashboard to Deep Dive Diagnostics

Thursday, July 5th, 2012

“So something *WAS* wrong with the network but was that the root cause?”

In my last post, we looked at how you can use the brand new Network Dashboard in up.time 7 to look for network issues. Now that you have confirmed the network was to blame for the poor performance, is the case closed? How do you know if there aren’t other issues? Are you content with guessing and hoping the network was the sole reason for the degradation of performance? That is almost like your doctor saying he/she can give you a full body checkup just by looking at your pinky! Today’s IT infrastructure is not composed of only the network. So how can you dig deeper?

Resource Scan

Resource Scan Dashboard

 

up.time enables users to get a complete view of their IT environment without navigating through a myriad of tools, or what we call “tool soup”. Jumping through different consoles is not only time consuming, it also makes the lives of system administrators more difficult. The time it takes to recover from outages will take longer, which ultimately, will affect your business. See more about tool soup here. How does up.time help? First, if you already have up.time’s intelligent alerting configured, the notifications you get should point you to the right direction. But let’s say if you wanted to do some impromptu analysis. How do you do that?

 

Visibility Over Time

Visibility Over Time

If it’s a performance-related issue, I would recommend looking at the Resource Scan Dashboard – see image above. It gives you a general sense as to how your entire environment is doing. At the same time, with a single click, you can drill down to the server(s) to get deeper information. Want to see why there was such a high memory usage in the middle of the night? up.time gives you visibility by letting you go back in time to identify the resource hog. Again, up.time’s single pane of glass allows you to gain a complete view of the infrastructure. If you haven’t already, download up.time to gain 20/20 vision into your complete IT environment so you won’t miss a beat!

 

– Patrick

 

Is your Capacity Planning Evolving to Meet Business Demand?

Friday, May 11th, 2012

 

As an IT systems management vendor, we get fired up about new technologies including the latest buzz around virtualized capacity, automation and cloud. We respond by building slick tools, dashboards and reports to help solve capacity problems. I believe that’s what (we) systems management providers are supposed to be doing, helping you solve problems. <shamelessplug> Reducing the complexity of capacity planning and management is something we do really well around here at uptime software! </shamelessplug>

capacity planning

Capacity Management is all about evolving IT Operations.

But what about the capacity planning function itself? Does it not need to evolve along with these new deployment technologies? Do current capacity planning functions contribute value to the business by helping them scale to meet demand?

Virtualization, automation and cloud technologies give IT execs more options than ever before in how services will be delivered to the business, but do their current capacity planning processes reflect this same evolution in technologies? For most the answer is still likely “no”.  Most IT organizations still seem to perform capacity planning at the individual component level (server, network, SANs) which does not represent the true capacity requirements of their global facilities and infrastructure resources. The good news here is that you CAN evolve and turn this situation around.

Planning and managing IT capacity at a macro level is critical to delivering cost-efficient and reliable business services in a time frame the business expects. The good news is that today’s virtualization and automation technologies allow flexibility and new cost alternatives so IT execs can choose from a myriad of platforms to run applications and services on. The bad news is that these new virtual and cloud based resources are certainly not free and without new capacity planning processes, the benefits of easy procurement and instant provisioning can quickly turn into over-allocation and cost overrun nightmares.

  • So the message is clear: IT executives need new and more effective capacity planning processes in order to really take advantage of new technologies by optimizing the placement of applications according to criteria such as service level and cost. In addition, capacity planning software and tools can help teams be more effective.

One tactic you might consider as a start is to elevate your capacity planning team. Get it out of the “back room” of IT operations and make it a strategic function. Yes, remove it completely from IT operations and centralize it as a corporate IT function that reports directly to the CIO. This will send an important message to your organization and capacity management will begin to evolve and operate decentralized from technology support groups, such as network, server and storage.

capacity planning software

But Rome wasn’t built in a day….