The up.time IT Systems Management Blog

Archive for the ‘IT operations’ Category

Stop the IT “Blame Game” and Get a Single Source of Truth!

Friday, February 22nd, 2013

My landlord just kicked me out.  Let me rephrase that.  My landlord just politely asked me to leave his property before my lease is up.  Thanks to him, my wife and I have been packing our stuff in preparation of our move.  In all honesty, my wife has been doing most of the packing.  I asked her to pack up all her belongings and kitchenware and I would do the rest.  Somehow, that message got lost and she packed my things too.  When I needed to look for my shoes and  couldn’t find them, I was upset.  I reiterated how I asked her to only pack her stuff but she said I never said her that.  Before I knew it, the blame game was in full effect.  It was her word versus mine.

These kind of things happen in IT infrastructure management too.  When you have more than one tool to monitor your environment and more than one data source for capacity planning, how do you know which one to trust?  The justification for IT environments to use a variety of monitoring tools is that their current set of tools cannot provide all the visibility they need.  For example, some tools are strictly for network monitoring.  Others might go really deep in Windows monitoring but light on everything else.  What’s worse is if there is an overlap in the metrics from each tool, so which one should you go with?  Different tools will gather metrics in different ways and at different time intervals.  One tool might catch a spike while another may not.  It is a full time job just to consolidate data and close information gaps to make sense of it all.

Here is where up.time is different.  up.time provides unified monitoring for all the silos within an IT infrastructure so you can have a true ‘single pane of glass’.  You don’t have to duct tape point tools together to make a homemade Swiss army knife.  up.time IS the swiss army knife and provides a unified and comprehensive view.  It makes capacity planning a breeze because it provides a single data source so you don’t have to try to make sense all the differing metrics!  You can eliminate the blame game (and headaches) in IT when you don’t have multiple tools telling you different things.  You don’t have to go to war with the network team arguing whose data is right when you have a standard tool providing a single view of the truth.  up.time is the solution that enables you to be the IT superstar.  Download up.time and give it a spin today!

Are Sysadmins Corporate Superheroes?

Tuesday, February 19th, 2013

Ever wonder who the superheroes are that make your company tick?

Are they the geniuses that live in your R&D labs designing and building the cool products you sell? Is it the creative marketing team that communicates with the entire planet 24/7 and lines up all kinds of potential buyers to your front door?  Or is it your hardworking sales and customer support teams that look after all your clients and grow your revenues in leaps and bounds year after year?

No doubt all these people are crucial to the success of your growing enterprise and when working creatively and in unison, will eventually produce profits that will make your shareholders drool. This is very true.

Another thing that’s true is that all these teams will fall apart fast and furiously without a dedicated and well oiled IT operation.

So…please allow me to introduce your unsung Corporate Superhero:

So what does your sysadmin do all day that is so important?

Well, pretty much everything that ultimately allows you to do anything at the office, on any given day. Day after day, sysadmins work frantically to install, upgrade and configure your information systems. Many sysadmins feel so underappreciated; the unsung heroes that they are, they often work crazy long hours (weekdays and weekends) to ensure everything works right just so the rest of us can actually get work done.

 

What does a sysadmin do exactly?

The short answer is, nearly everything when it comes to the technology behind your personal and business life. To demonstrate the scope of responsibilities attributed to sysadmins everywhere, here’s just a short list:

  • Server Installations
  • Network Configuration
  • Operating System Installs & Upgrades
  • Server Temperature Control
  • Firewall Set Up and Maintenance
  • Email Configuration and User Set Up
  • Data Backup and Restoration
  • The list goes on and on…

Sysadmins help users out when they need hardware and software support. Sysadmins monitor networks for problems. They plan and map out systems to discover more effective ways to manage your precious computing resources. They build infrastructures that are more resilient to today’s ever-changing, agile world; and in so doing; sysadmins reduce company cost, advance innovation and contribute to the overall growth of any given organization.

Pretty amazing isn’t it? What’s even more amazing is how many of these unsung IT heroes I’ve come across every week that still do not have enough of the basic software tools to do their jobs properly. This is what makes my job so satisfying at uptime software – building software to make sysadmins lives easier so they can get what they need done faster and go home a bit earlier every night.

So please do not forget your IT heroes or take them for granted….stop by your IT department on your way out tonight …and just say THANKS!

How Does Your Monitoring Solution Deal with Maintenance?

Friday, November 23rd, 2012

Performing maintenance is never fun.  It’s time consuming, and sometimes issues might arise during maintenance that make you want to pull your hair out. Regardless, to maximize uptime, we all have to do it from time to time.  Whether we are talking about IT infrastructure or automobiles, it’s just something that has to be done.  In the context of IT infrastructure, whether you are fixing a hardware issue or just performing regular maintenance like patching or upgrade, it’s a necessary evil.  There are a few things you need to consider when you are performing maintenance on your servers and devices when you have a monitoring solution in place.

Alert Noise

If you know you have to bring your car(s) into the shop for repair, you would clear your calendar and not make any plans because you won’t be able to go anywhere.  Similarly, if you have to bring your servers down for maintenance, you know they won’t be working as they normally would.  You know your monitoring solution will cause a “sea of red” and send out tons of alerts.  Life does not have to suck.  You can schedule maintenance in up.time so that alerts will not be triggered during the maintenance window.  If you have to bring systems down for an emergency, you can also do it in an ad-hoc manner as well.

Accuracy of SLA

If you are measuring Service Level Agreements (SLAs) in your environment, most likely you won’t want to count maintenance against your SLA.  Most maintenance times are scheduled.therefore, frequency of maintenance is something that you and your customer had agreed on.  If scheduled maintenance decreases your SLA, it will skew the actual expected availability.  Consequently, up.time provides the option whether you want scheduled maintenance to count against your SLA.  Notice I said scheduled maintenance and not ad-hoc maintenance.  Ad-hoc maintenance is something that’s not planned, which means customers will experience unplanned outage and should be counted against the SLA.  Whether it’s scheduled or ad-hoc maintenance, you can count on up.time to accurately measure your SLA.

Most people don’t think about how maintenance could affect monitoring and SLA reporting in their IT environment.  Make sure you don’t get caught in the cold. Take control over how you manage your IT infrastructure.  Download up.time’s 30 day free trial and see the difference!

Introducing the New Dashboarding API – Sneak Peek Pt. 2

Tuesday, October 23rd, 2012

It’s time for part 2 of my sneak peek at up.time 7.1′s new Dashboarding API. Please take a moment to review part 1 for some basics on the API including how to list elements, groups, and monitors.

The Status End Point

The second major piece to the API is the ability to list the current status of elements, groups and monitors. By simply adding the ID of your target followed by /status, a simple listing of all related monitor status is produced. Here’s an example:

GET https://win-dleith:9997/api/v1/elements/14/status

Produces this example JSON output with details on both the element status and the status of each service monitor associated with this element.

{
   "id":14,
   "isMonitored":true,
   "lastCheckTime":"2012-10-22T15:16:44",
   "lastTransitionTime":"2012-10-22T12:14:56",
   "message":"",
   "monitorStatus":[
      {
         "acknowledgedComment":null,
         "elementId":14,
         "id":250,
         "isAcknowledged":false,
         "isHidden":true,
         "isHostCheck":false,
         "isMonitored":true,
         "lastCheckTime":"2012-10-22T15:17:30",
         "lastTransitionTime":"2012-10-22T12:17:31",
         "message":"All metrics collected successfully",
         "name":"Platform Performance Gatherer",
         "status":"OK"
      },
      ....
   ],
   "name":"vmh-rd3.rd.local",
   "powerState":"On",
   "status":"OK",
   "topologyParentStatus":[
      {
         "id":2,
         "isMonitored":true,
         "lastCheckTime":"2012-10-22T15:16:54",
         "lastTransitionTime":"2012-10-22T12:14:48",
         "message":"",
         "name":"rd-vc2",
         "powerState":null,
         "status":"OK"
      }
   ]
}

Notice that this status information includes the times of any recent checks, the current power state of virtual elements, and information about this element’s topological parent so that you can piece together topology views across your enterprise.

Now let’s talk about some of the fun stuff. By piecing together inventory information as well as availability information, we can start to craft some very exciting interactive views of your environment. Here are some examples that are being released along with up.time 7.1 for you to use as a starting point for your dashboarding needs. You will be able to find these examples on The Grid or our new github page.

Pin+Image – Written by Joel

A world map example indicating the status of key applications around the country. Each ‘pin’ highlights both the status of the application and any member service monitors. Hovering over a pin brings up more details on the element and allows you to drill down with a simple click. The background image and location of any status indicators is completely customizable. Build your view to suit whatever your NOC needs.

Incident Console – Written by Patrick

A highly interactive operations view for operations teams or administrators, combining data from several up.time monitoring stations around the country. This console can be extended to link into help desk, ticketing systems, or even configure one-click console/rdc access to help you quickly triage any ongoing problems.

Dynamic Topology View – Written By Alex

Drill down through your topology to easy see the root cause of any outages in your environment. With this heads up view you don’t have to navigate to different pages to understand how the key components of your environment relate to each other. Upstream and downstream components are instantly clear based on your defined up.time topological dependencies.

We’re really looking forward to the up.time 7.1 launch and I will be speaking live at our sneak peek webinar today at 1pm ET. More information one the release will be in your inbox next week.

Reduce Energy Waste and Sprawl in your IT Infrastructure with Capacity Planning

Thursday, October 18th, 2012

Recently, there was an article talking about data centers wasting vast amounts of energy. Although it spoke primarily about the super-data-centers from some of the big players and service providers (i.e Facebook, Google, Amazon), the same can be said about almost any data center.

Running any number of servers efficiently not only requires a lot of thought and preparation beforehand, but it also requires continual attention to how they’re being used. This is especially true if virtualization is in the mix since it gives IT more power and control over the resources they have available. What used to take days or even months to order a new server now only takes a few seconds with virtualization. This double-edge sword of power also makes it that much easier to shoot ourselves in the foot by allowing us to waste resources on servers that are no longer used and over-provisioning the resources we do have. These types of problems can be simplified into two main problems: Capacity Management and Sprawl.

If we’re just waiting for email alerts telling us that our disks are getting full, we can get into trouble. This is great if we want to stay at the proactive stage of monitoring our environment, but if we want to get more proactive, we’ll need to start analyzing data and reports to determine which systems will need more space in the future. Capacity Management helps us detect potential issues before they become a problem. It can also help us track down odd resource behaviors on systems that should be fairly stable or differentiate between regular spikes of utilization and an actual outage. If your monitoring solution doesn’t have the reporting tools necessary for this, is it really doing enough for you?

Sprawl happens when we have servers (physical or virtual) that are no longer being used but are still taking up space and resources; essentially costing your business money. This can be very difficult to track since a virtualized environment can be constantly changing every second, and you don’t have time to wait for your monitoring solution to catch up. Having an all-encompassing monitoring solution that provides visibility across your entire application stack, including the virtualization layer, and includes important reporting features out of the box is what every enterprise who’s interested in saving money should have already. Yes, up.time has all of these things, and more, so if you haven’t had a chance to check it out yet have a look at our free trial here.

Building a Better Capacity Planning Process

Monday, October 15th, 2012

I was recently re-reading a post I wrote back in May of this year entitled “Is your Capacity Planning Evolving to Meet Business Demand”  where I discussed how new technologies represent both challenges and opportunities for IT executives when it comes to capacity planning and the importance it plays in helping businesses grow:

“IT executives need new and more effective capacity planning processes in order to really take advantage of new technologies by optimizing the placement of applications according to criteria such as service level and cost. In addition, capacity planning software and tools can help teams be more effective.

One tactic you might consider as a start is to elevate your capacity planning team. Get it out of the “back room” of IT operations and make it a strategic function. Yes, remove it completely from IT operations and centralize it as a corporate IT function that reports directly to the CIO. This will send an important message to your organization and capacity management will begin to evolve and operate decentralized from technology support groups, such as network, server and storage.”

 

I present some ideas in this post including a high level model, roles and skills that will help you create a new and strategic capacity planning function within your IT organization.

First off, to be clear, the strategy of creating a strategic capacity planning function involves much more than just assigning the job to one or more technologists, giving them some great software (like up.time) which helps produce all kinds of automated reports that show CPU, storage and network capacity trends, and then holding weekly meetings to look at consumption charts. You need to have the team (or individual if you are starting with a team of one) focus on your business from “the big” picture” perspective.

Ideally your new and improved capacity planning process will look something like this:

 

 

This new capacity planning process (ideally a 3 person team or function) should consider assuming the following new roles and developing specific supportive skills including:

 

 

Building a new capacity planning process and supporting it with these roles and all the skill sets you need may not happen overnight. But the shift in the way your teams will start to think and plan for enterprise capacity will have profound and lasting benefits to your business. I hope these ideas help you in your quest of building a better and more strategic capacity planning function.

 

 

Auto-Discovery Improvements in up.time 7.1

Monday, October 1st, 2012

up.time 7.1 is introducing a few new changes to our existing automated discovery process that will allow you to scan and monitor your datacenter more efficiently than in previous versions. We have always strived to be one of the easiest monitoring tools on the market to set up, roll out and maintain. Our users have reported being able to roll out monitoring to as many as 400 servers a day using a combination of our agentless and agent based monitoring. With 7.1 we wanted to push this number even higher to save you even more time and effort. The result is a discovery process that is at minimum, 50% faster, and much easier to use with much less manual work and human error.

For those of you who haven’t tried up.time yet, we support the following auto-discovery methods

  • Network Scan: Provide a subnet range and up.time will scan for any servers, network devices, IPs, etc… that can be added.
  • VMware vCenter Server Inventory Synchronization via up.time vSync: up.time will tap into your vCenter environment and automatically add all attached elements in a matter of seconds. up.time will also keep this inventory in sync, so newly discovered VMs & vSphere servers will have monitoring applied automatically.
  • IBM pSeries Discovery: Scan your HMC or VIO servers to discover all attached frames and LPARs

Using these methods it is very easy to populate up.time with everything in your environment. We also offer utilities that will bulk import systems from a text file if you happen to have a running list of every server in your environment.

With up.time 7.1 we are introducing these changes to the auto-discovery process

  • Faster Network Discovery: In up.time 7, the discovery process on a subnet with 200 elements took on average 4m30s (270 seconds). In up.time 7.1, the same subnet now takes 1m45s (105 seconds). That’s a 62% performance improvement, discovering subnets will now take less than half the time.
  • Faster Bulk Addition of Elements: After the network has been discovered, it took on average about 20 seconds to add each discovered element. It was a manual process that could take quite a while in large environments. In up.time 7.1 any discovered element can now be bulk added using standard credentials, this completely removes the manual overhead time and reduces the average add time to about 3 seconds. After entering credentials once, it will take about 10 minutes for up.time to automatically add all 200 elements while you go refill your coffee.
  • Wizard based discovery: The process of setting up a discovery has been streamlined so that it’s easier to see where you are in the process, and understand how each discovery option works. We have also tied vSync discovery into the auto discovery process to highlight it’s discovery capabilities.

That’s all for now, more information will become available over the coming month as we approach the up.time 7.1 release. I will also be hosting a Sneak Peak Webinar to introduce up.time 7.1 to the world in late October, more details to follow.

uptime software selects Lifeboat Distribution

Tuesday, September 18th, 2012

 

Quick bit of news from us. We have finalized an agreement to have Lifeboat Distribution, a subsidiary of Wayside Technology Group, Inc. (NASDAQ: WSTG), distribute up.time. This is very exciting, as it will increase the reach of the up.time product suite to new customers in North America.

Phil Didaskalou, uptime software CEO, explains, “uptime software is excited to be partnering with a distributor like Lifeboat, who can extend the reach of our IT systems monitoring product, up.time. Lifeboat’s strong reputation for proactive and technically-informed reseller service will help take up.time to a new set of customers.”

For more information, please see the press release.

The SLA Dashboard – The Key to Measuring Business Services

Wednesday, August 22nd, 2012

SLA MonitoringLast time we discussed how you can instantly identify the impact of your IT infrastructure on business services.  Taking that one step further, let’s talk about how you can measure your business services.  There are many metrics one can use to determine the performance of their business services.  Let’s say you want to measure the throughput of your email system.  up.time has Service Monitors for that.  Or let’s say if you want to determine the number of requests your web server gets, there’s a Service Monitor for that.  These are important metrics, but they are not necessarily essential.

The one vital metric is how often the business services are up.  And as you already know, a business service is not a single process on a single computer.  Whether the service is available or not depends on a number of components.  How do you tie all the components together and have a single number representing the health of your business service?  One way to gauge is by using the Service Level Agreement (SLA) dashboard in up.time.

uptime software SLA Dashboard

SLA Dashboard

Why should you use the SLA dashboard?  If you have SLA requirements, it’s a no brainer to use this dashboard because it provides a real-time status of your SLA. It also lets you know if you are on thin ice, telling you exactly how many more minutes you have until your SLA will be breached.  But what if you don’t have any SLA requirements? You should still setup SLA’s for each of your business services because up.time will be able to quantify how well you are managing your IT infrastructure.  up.time can determine the percentage of time when a service is available given the compliance period.  If you are system administrator, your performance is tied directly to whether services are accessible to the users.  That bonus you wanted?  You will have a much stronger argument why you deserve it if you can show a concrete KPI of how well you are doing.  Or if you are a CIO, having a magical number summarizing what you need to know will enable you to work more efficiently.

Whether you have formal, informal or non-existent SLA on your business service, to get a complete view of your IT infrastructure, the up.time SLA dashboard is a must-have!  Download up.time and take it for spin!

-Patrick

Coffee and the Art of End-to-End IT Infrastructure Monitoring

Tuesday, August 14th, 2012

 

In a recent survey of IT infrastructure and operations executives conducted by Forrester Research,

This statistic is interesting because it clearly depicts that over the past few years, the organization’s need for accurate infrastructure, application and end-user experience monitoring has accelerated.  It also unveils the ever-increasing demand put on IT operations to align with, and achieve business objectives – namely competitiveness and differentiation of products or services.

For example, let’s assume you are a global coffee retailer with 5,000 stores. It is not the ‘90′s anymore, so measuring the availability of your POS (point-of-sale) systems, in order to get a leg up on your competition, is simply not good enough. Today’s coffee retailers need to understand and measure detailed metrics about their business, including turnover, coffee line wait times (the last thing you can afford is to have your addicts… err I mean customers…  starting early morning riots), in-store Wi-Fi usage/congestion and more, in order to ensure that customers have a consistent and positive experience every time.

These important technology enabled metrics need to be measured accurately, stored and rapidly reported on across a large global enterprise. This may sound easy, but it can be a nightmare if you don’t have the right tool set and/or processes in place to support enterprise end-to-end monitoring.

Here at uptime software, one thing we do very well is provide a unified suite of monitoring and reporting capabilities all in one package. up.time is an IT Dashboard that watches over and reports on all servers, networks, applications, databases and service-level agreements (SLA management) conducive to supporting the complete monitoring of end-to-end IT infrastructure. So if this type of initiative is in the cards for your operation, now or in the near future, do give us a call and you’ll be amazed at how the right tools can make the task much easier.

I’ve left three tidbits of advice below that are worth considering for IT Operations executives when embarking on an end-to-end monitoring journey:

 

  1. Consolidate all end-to-end monitoring tool ownership responsibilities: Defining agreements, purchase, installation, configuration, maintenance and integration of an end-to-end monitoring tool – into a single group placed within the command center with tight alignment to the service desk.
  2. Develop an inventory of your current monitoring tools: Identify gaps and opportunities for IT monitoring tool rationalization and consolidation based on where you are sufficient and deficient.
  3. Develop close relationships with your business units and sponsors:  This is a unique opportunity to demonstrate real value to the business, which is not always easy to do as an IT executive… so go big and be bold!

 

Thx,

Phil.