The up.time IT Systems Management Blog

Archive for the ‘management’ Category

Devotion to Duty

Monday, February 22nd, 2010

Today’s xkcd comic was one that I got a real kick out of.  Picture John McLane as a sysadmin, and you get the picture.  The unstoppable reluctant hero, the right guy in the right place at the wrong time.  The relentless pursuit of availability and performance for the apps they support no matter the effort, that common thread amongst all great sysadmins worth their salt.  But at what cost to the admin and those around them does this come?  Well if they have subpar systems management software, at great cost.  A good toolkit of monitoring/management software and a few point tools for some vendor specific use cases will allow our protagonist to go from being the burnt out, run down admin to becoming the Dicky Fox of IT and jump each morning head first into whatever the world (or the Datacenter) can throw at them.  Systems Management software is to the sysadmin what spinach is to Popeye.  It’s going to give them what they need when the going gets tough.  With detailed drill down data and analytics traversing from Physical to Virtual environments and back becomes something that is done with ease. 

I’m a big fan of tools, my workshop has far more than my wife thinks any sane person should require.  There is a saying, “The right tool for the job”.  You wouldn’t try and screw in a Philips head screw with a Robertson driver (The Robertson, BTW is the possibly best screw head ever.  And a nice little Canadian invention.  Licensing issues kept the world from reaping the benefits of this beauty).  When picking the right tool for the job, you are balancing a few things.  Cost and capabilities being key.  You can buy a $30 screwdriver that only screws in one type of screw, or you can buy a set of screwdrivers for $30 and do all sorts of different screwing.  I’ll tell you though that the $30 single driver will probably never strip and will be able to drive screws until you lose it.  On the other hand, the $10 driver will probably do the trick as well, and provide you with a quality driver.  Where am I going with this?  The systems management space has all kinds of offerings that you can put into your toolbox.  There are expensive tools that do one thing and do it flawlessly.  There are cheap tools that can do a mountain of things, but they don’t excel at any one thing and you’ll end up outgrowing them as you become more proficient with your tools.  Then there are the sweet spot tools, the Rigid’s of the software world.  These tools that do exactly what you require, they do it well and you would be hard pressed to outgrow them.  This is where I feel that up.time fits into the systems management software space.  We’re not the cheap tool, but we’re not the overly expensive Tivoli or HPOV framework either.  We fit into that sweet spot where you are going to get pretty well everything you could ask for and be happy with what it cost you.

So do your sysadmins a favour and thank them by letting them trial up.time.  It will make their life easier and make the you, the IT manager, look like a hero as well with increased productivity and cost-savings. Even if you don’t go with a solution from us, when your sysadmins ask for tools, open your IT wallets for them at least a little.  Some IT spinach will go a long way to keeping the strength in the arms of your Datacenter Popeyes!

Just how disruptive is Cloud technology?

Monday, November 9th, 2009

Let’s understand for a moment just how disruptive Cloud and virtualization technologies are to OTHER technologies. Ignore for a moment, all the changes required to business processes, maintenance processes, infrastructure deployment models and all the other stuff people have been beating to death over the past 2 months.

Just how pervasive and challenging is Cloud technology to entrenched technology? Well for one, people are redesigning and re-thinking how we use TCP/IP in order to enable and Long Distance VMotion. That’s right, in order to be able to forklift virtual instances and massive data over the internet, companies like netex have figured out how to make the old building block of the interwebs TCP/IP even better – dubbing their new UDP over IP translation technology “HyperIP”.  HyperIP optimizes TCP/IP so that you can move a full vmware instance over the wire up to 10X faster than usual. (Let’s not even talk about how people will monitor this new disruptive technology, but you can bet it’s the agile players who are even aware of the new challenges in this space).

The potential for this technology is 100% clear, and probably is somewhere in a lab being coveted by the people at VMWare as “my precious” – especially in the context of their desire to get remote DRS as a solidified feature in the VSPHERE platform.   If VMware manages to get this integrated as part of remote DRS and they start forklifting instances to/from and across the Savvis and Terremark clouds this will be a giant leap towards making unified compute and private/public clouds – “as real as it gets”. This doesn’t even take into account the latest ‘turnkey’ private cloud solutions unveiled by VMWare known as VBlocks.

The clouds just zapped TCP/IP, what’s next?

Would you like some HYPE with your Management Tool Soup?

Tuesday, October 13th, 2009

As a Solutions Architect, part of my job is to work with new prospects who are quite often bombarded by messaging from a wide variety of sources. By the time they get to me, usually ultra-niche players, or platform focused players have tried to convince them that what they need is a tool to solve their needs in a narrow or short sighted manner.

An example of a platform focused player are the tools the have a specific focus, say Windows for instance. Although tools like this appear to be broad, with a solid framework, they fall flat on their face when your organization brings in other platforms. This need will eventually arise in your organization at one point or another because of expansion, a need for new technologies to drive the business or even more importantly when your company has success and buys another company.  The contrast, of course, is a tool that can give you a single point of visibility into all hardware/software stack combos commonly found in the data center – including virtualization stacks.

The question to ask yourself is, what is the cost of going with a niche player? What will the time investment loss be when you are forced to adopt new technologies?

An example of an Ultra-Niche player would be the virtualization-only focused players in the market. Any vendor that focuses specifically and only on VMWare capabilities and visibility would be a great example. One such vendor focuses narrowly on consolidation and migration products. These products have such a narrow scope of focus, and can only be used as such a limited part of the IT systems life cycle.  They end up being thrown out after the consolidation and migration process is complete. More broad tools (like <here is my plug> up.time) in contrast, has the capability to aid you over the entire life cycle of your virtualization project AND most importantly ensure you have visibility over this infrastructure and the application and services that run on them – in the context of the whole data center.

The above two points often act as a point of illumination into the true capability of our product. It is very hard to find a product that incorporates the real useful features of those niche tools, that maintains a broad spectrum of platform support for heterogenous views, and lastly does all of that in an easy to roll out manner. It’s easy to see, that of the 300 to 400 vendors you can find on google that say they do systems and server monitoring, there are only a handful that can say they have the mandate and mission that uptime has set forth to accomplish.

“Ease of use” is a point that cannot be overstressed. In my role, we have displaced many products from much larger competitors, simply because our product focuses squarely on quick roll out and measurable results. We focus on ensuring that a minimum amount of administrative overhead is required to start collecting data that is immediately useful to your organization and then ensuring that that data can be used for a wide variety of uses. All the while the focus is to ensure that the client is able to do “what they need to do”, “when they need it”. Our clients realize that you need a tool that will guide you from simply monitoring infrastructure in a way that encourages adoption and pro-active action from “day 1″, while also allowing your organization to grow into sustainable capacity planning, virtualization planning, and SLA monitoring, reporting, and management.

It’s also very important that clients remember, that it’s the little things that matter. Many products emphasize alot of hype around their latest GUI features. Don’t get me wrong, uptime is no ugly duckling, we have one of the cleanest and most professional UI’s out there. What I am saying is, that clients quickly get caught up in needless or useless visualizations to impress people, not realizing that they are focusing on the features that really matter to the big picture. If your chosen system has a fantastic 3D rotating flaming logo, that’s amazing! I am sure it will likely impress alot of people initially and likely easily get you budget when you present it internally. But if the chosen system doesn’t have the features to laser guide notifications, escalate problems effectively and ensure that your staff don’t get unintelligible or spurious alerts at 3AM – you can bet that flaming logo visualization will be ignored soon and the product will be considered a bad investment down the line, putting you and your team at risk.  

By focusing on the ideas behind the examples above, one can see how quickly you can cut through the hype, avoid tool soup, and ensure that your organization ends up with a toolset that’s going to “get you there today” and “take you there tomorrow”.

I encourage you to join one of our public webinars to see for yourself how different and refreshing it can be to see a product demonstration that focuses on real client challenges…and no you won’t  be left at the end of the presentation asking  yourself if you should get some of that hype with your management tool soup.


Incident Priority Tracking with up.time 5.2

Wednesday, August 12th, 2009

With all of the summer vacations happening it’s been a while since my last post but to get back in the swing of things I really wanted to talk about a feature from our newest release of up.time, version 5.2. The “Incident Priority Quadrant” report has gotten a lot of buzz from both existing customers, prospects and the press. Specifically I have one large financial customer who upgraded to up.time 5.2 and started running the “Incident Priority Quadrant” report on a weekly basis for their Tier 1 & 2 Applications. They are now easily able to see where they need to concentrate their very “limited” resources. As well they have been looking at the areas where they can setup some automation. This brings me to another topic – VMware’s new “vCenter Orchestrator™”. up.time 5.2 has direct integration with this drag and drop Automation and Orchestration tool… but I will save all that for my next post.

Incident Priority Quadrant Report

[/caption]

What to look for in a VM monitoring solution

Friday, May 8th, 2009

I was recently reading through some of the questions over at the “Official VMware Virtualization Group” on LinkedIn, and there was a question about what to look for in a VM monitoring solution, so I thought I would share my response here.

When looking for a VM monitoring solution, you’ll need to look at what you are currently doing from a monitoring standpoint today and decide if you are looking for a VM only point tool, or if you are looking for something broader, that will give you an end to end perspective on your VM environment.  There are great bespoke tools out there for performing very specific tasks, however when you end up with a number of point tools, troubleshooting, reporting and analysis of the environment can become much more difficult.

In an ideal world you want to be able to address the ’3 M’s', Monitor, Measure and Manage with the solution you are putting in place.  Monitor in order to collect the required metrics from the host and guest level as well as any applications or services being supported by the guests.  Measure these metrics against specific goals or thresholds to ensure that everything is operating within the design parameters for the service.  Finally, manage the end to end delivery for the application services being provided to the customer by the virtual infrastructure.  This includes monitoring the end user experience for the apps themselves and the delivery of the services against stated goals within SLAs.

By monitoring, measuring and managing, you will be able to ensure the quality of the delivered service to your customers (internal or external) and have all the information required to effectively maintain the service into the future.  Through proactive alerting, capacity planning and a monitoring solution that provides an end to end single pane of glass across your infrastructure, you’ll be able to reduce your MTTR whenever you experience an outage and run your environment as efficiently as possible.

From a purely product standpoint, you should keep in mind how the monitoring tool or software is licensed.  Some vendors charge per monitored resource, some by CPU socket or core, some by physical host with guest VMs coming at no charge.  Licensing charges can quickly add up in a virtual environment, especially if you are being charged per resource.  With it being so easy to spin up more and more VMs, the associated licensing costs to monitor those VMs can grow in a hurry.  If you check out the vmware monitoring page at uptime software you’ll be able to see what we bring to the table for end to end VM monitoring.

Disaster Recovery Planning

Wednesday, May 6th, 2009

Wait! Don’t go!  I know this is a horrendously boring topic but I’ll try my best to make itsomewhat interesting…

For a while now I’ve been knee deep in planning how the IT department at uptime would respond in an emergency, or disaster.  It’s not an easy task. In fact, when I sit down to write the plan I become consumed by it and find myself expanding the document beyond it’s scope.  It has become one large brain-dump document and, while useful as reference material, it would help little in the event of an emergency.

With this document looming over me like the monolith in 2001 A Space Odyssey, I decided to break it down into manageable chunks, and break out the meat of the document into an Interim DR Plan.  As DR can be daunting, and take a considerable amount of time, an interim DR plan can help you out of a sticky situation should one arise.  It’s better than having nothing while you build your comprehensive response plan.

In a nutshell, an interim plan is…

  • A low-effort, quick-to-market plan.  Don’t spend more than 20-30 person-hours on it.
  • An outline of how your business can continue operations with limited resources in the event of a disaster.
  • Not a substitute a full DR plan.  Though it can and should be a framework for building out your comprehensive DR plan.

Your Interim DR plan may contain:

  • Emergency response team members, and all their contact information
  • The Disaster Declaration Procedure — How the business actually declares an emergency.
  • Communications procedures – How your emergency response team will communicate before, during and after the recovery efforts.
  • Recovery Plan Procedures – This is the hard part, but in essence is a description of recovery procedures, alternate locations and contingencies for the business functions identified in the plan.
  • Background information – Who owns, sponsors and updates the plan.  How often the DR team should meet, and the document update frequency.
  • Preventative Measures – Simple things like off-site backups, or data replication to a secure location can save loads of time and money.  Consider moving to Software as a Service for critical business functionality.  I’ll write a post on SaaS later.  I’m not a big fan, but in the event of a disaster it could just save your hide.

Once you’ve documented the plan, publish it and store it in several locations.  Even consider putting it online somewhere outside of your infrastructure.

uptime software has a DR plan in effect and we continue to evolve it over time to meet our needs.  We are confident that, should disaster strike, our operations will continue relatively unharmed.  If you are looking at server monitoring solutions elsewhere, ask them if they will be able to support you if their operation catches fire, or floods, or endures an earthquake.  uptime will.