The up.time IT Systems Management Blog

Archive for the ‘virtualization vmware’ Category

up.time 6 – Get Your Sneak Peek at our New Baby!

Thursday, October 6th, 2011

The up.time 6 launch is just around the corner (end of the month)! So, we wanted to give you a sneak peek of what to expect. This new release is all about one thing; helping you monitor and manage your VMware environment better. Our development team has worked hard make VMware monitoring and management as easy as possible for IT departments, because we know you don’t have a lot of time on your hands.

In addition to the new VMware monitoring and reporting capabilities, up.time 6 continues to deeply monitor across all datacenter infrastructure and applications to give you the most complete set of metrics on performance, availability, and capacity. You’ll have full control over your servers and services across Windows, UNIX (IBM AIX, Sun Solaris, HP), Linux, VMware, Novell, and more from a single dashboard.

Here’s a quick preview into two of the major additions that will be included in up.time 6:

SMART Monitoring

1. Smart VMware Monitoring. We’ve taken a “Set it and forget it” approach with these new functionalities, allowing IT guys to save some of their time on virtualization monitoring. This includes:

  • Real-time vSync: To ensure your monitoring is fluid with your VMware environment. This will allow you to immediately know when VMs are added or changed with monitoring and alerting that’s automatically applied.
  • Sprawl Control: Be alerted and take automated action on new VMs, including license validation, resource allocation, and security compliance.
  • VM Power Awareness: Monitor power usage in your VMware environment to track energy savings initiatives, isolate power gobbling applications and workloads, and to map power usage to capacity over time.

 

up.time 6 Capacity Planning2. Comprehensive VMware Capacity Planning. We know that the #1 driver of your VMware performance is Capacity, so we’ve been working hard to make VM capacity planning easier. A huge problem across the systems management space is that IT professionals need to know how much capacity they have in their VMware environment, how much they’re currently using, and where the capacity bottlenecks are – and most importantly, when they’re going to run out so that they can prepare for future requirements.  Up.time 6 has addressed these problems by adding:

  • Capacity Bottleneck Trouble-Shooting, which will alert the minute a capacity bottleneck appears so you can regain control by locating them, in minutes, with deep, easy to use (3-click) capacity metrics.
  • Global VMware Capacity Reports, which will allow you to easily see and compare historical capacity trends across VMware (clusters, resource pools, vApps, VMs, ESX hosts, vCenters, Datacenters, and more) to help establish baselines for upgrade or consolidation projects and ensure that you never overspend on, or run out of, capacity again.
  • Virtual Capacity Forecasting. Stop getting caught begging for additional capacity and storage. Our new forecasting will allow you to easily and accurately see when virtual capacity will run out long before it pops up and bites you. It will find the users and business units who are over allocating or inefficiently using storage so that you can address their work practices.

I’m really excited for this release and the functionality that it brings to the table, and I think our clients will be more than pleased with what’s to come!

If you’re looking for a closer look at what I’ve talked about and/or want to see some of this functionality in action, our Product Manager is hosting a “Sneak Peek” webinar today (Oct. 6th) at 4pm EST that I recommend you attend. To register, click here.

- Alex

Notes from VMworld: Future is Cloudy

Tuesday, September 14th, 2010

First week of September, at VMworld, I was able to hang out with 17,000 of my closest VMware colleagues and get the lowdown on the vision of where VMware is going. uptime software also hosted a customer event at the Press Club, just next to the Moscone Centre, and it was a nice chance to meet a great group of up.time users in an informal group.

In a nutshell, VMware identified three tiers of technology that they are going to address with a portfolio of products: mobile, application platform, and infrastructure. By spanning these three tiers, VMware intends to be able to deliver applications to end-users with a few clicks – and these applications would be dynamically deployed, secured, managed, and accessible to a user on any kind of device. Obviously, there are a lot of moving parts to delivering this, and not all are in place yet, but the vision is compelling.

VMware also announced the acquisition of Integrien, a small predictive analytics company, and they will be used for automatically identifying problem VM guests in a vSphere infrastructure. Integrien doesn’t actually do any performance data collection and relies on other tools to do the collection. This acquisition obviously puts a bit of chill in the VMware partner ecosystem, as VMware is building up its systems management portfolio and components of it will compete with partner offerings. We at uptime are excited as our broad heterogenous monitoring capabilities can feed into Integrien and help administrators diagnose multi-platform problems quickly. It’s also important to point out that up.time does high-level application performance and SLA reporting and this is not an area addressed by Integrien.

While wandering the Solutions floor, there were many familiar faces, and it was great to check out the latest offerings. One thing became very apparent though, and that is there weren’t many heterogeneous systems management tools – yes, it was a VMware show – but for the next foreseeable number of years, people will not be deploying a pure VMware solution, and will need to monitor all of their infrastructure. As applications start to spread from physical systems, become virtualized, and then start moving into the cloud (be it private or hosted), a coherent single-pane-of-glass view is still necessary, especially since applications will be fragmented. Monitoring applications with multiple tools is going to be problematic and will pose problems for manager of managers tools if people want a consolidated view of application performance and availability.

All-in-all I feel that up.time is well positioned to span all the VMware stacks and offers the higher level SLA and Application monitoring/reporting capabilities that people need as they explore private cloud.

Alex

See you at VMworld 2010 in San Francisco

Wednesday, August 25th, 2010

uptime is going to be at VMworld in San Francisco next week.  If you’re heading down and want to join us for a meet and greet at the Press Club on Tuesday from 6:30-9, then please send a note to Lindsay Wagter and she’ll get you on the list.

We have a nice crowd showing up already and we’ll have a handful of uptime (myself included) people around to ensure you’re well looked after.  So, if you’re tired after a day of sessions and are looking for a comfortable place to kick back and relax, enjoy some entertainment on our behalf.

Hope to see you next week.

Alex

up.time on YouTube

Monday, December 14th, 2009

We’ve just posted two videos to YouTube demonstrating how you can achieve a reduction in MTTR and avoid incidents in the first place using up.time and its powerful integration capabilities with VMware’s Orchestrator tool.

Part 1 of the videos contains a quick overview of up.time and its capabilities, and Part 2 of the videos demonstrate being able to dynamically scale VMware instances based on real-time monitoring and changes of load in a sample e-commerce application.

»Part 1 – http://www.youtube.com/watch?v=GLI9yRGS0fo

»Part 2 – http://www.youtube.com/watch?v=F7v7_NNgJsQ

Alex

Just how disruptive is Cloud technology?

Monday, November 9th, 2009

Let’s understand for a moment just how disruptive Cloud and virtualization technologies are to OTHER technologies. Ignore for a moment, all the changes required to business processes, maintenance processes, infrastructure deployment models and all the other stuff people have been beating to death over the past 2 months.

Just how pervasive and challenging is Cloud technology to entrenched technology? Well for one, people are redesigning and re-thinking how we use TCP/IP in order to enable and Long Distance VMotion. That’s right, in order to be able to forklift virtual instances and massive data over the internet, companies like netex have figured out how to make the old building block of the interwebs TCP/IP even better – dubbing their new UDP over IP translation technology “HyperIP”.  HyperIP optimizes TCP/IP so that you can move a full vmware instance over the wire up to 10X faster than usual. (Let’s not even talk about how people will monitor this new disruptive technology, but you can bet it’s the agile players who are even aware of the new challenges in this space).

The potential for this technology is 100% clear, and probably is somewhere in a lab being coveted by the people at VMWare as “my precious” – especially in the context of their desire to get remote DRS as a solidified feature in the VSPHERE platform.   If VMware manages to get this integrated as part of remote DRS and they start forklifting instances to/from and across the Savvis and Terremark clouds this will be a giant leap towards making unified compute and private/public clouds – “as real as it gets”. This doesn’t even take into account the latest ‘turnkey’ private cloud solutions unveiled by VMWare known as VBlocks.

The clouds just zapped TCP/IP, what’s next?

IDC Highlights uptime software in “Worldwide Performance and Availability Management” Report

Tuesday, September 29th, 2009

I’m quite pleased that IDC has mentioned us in the “companies to watch for” section of their latest Performance and Availability Management report. What is also worthy of note is that we are in the top four corporations that sustained high growth rates over the past year, a feat that has eluded many of our larger competitors.

Here’s the press release snippet:

uptime software Listed as a Company Worth Watching for the Future of Systems Management

Toronto, Canada, September 29, 2009uptime software today announced its inclusion in IDC’s “Worldwide Performance and Availability Management 2009″ report. In the report, uptime software was positioned as a company to watch for its ability to monitor and optimize virtual and physical server performance, availability and capacity utilization across geographically diverse locations. The company joins a broad portfolio of systems management software from companies including HP, IBM, CA, Microsoft, who were also mentioned in the report.

uptime offers systems management software that enables IT organizations to manage, measure, and monitor physical, virtual, and cloud based assets, applications and services from a single, unified console. According to IDC, this type of functionality aligns well with the future needs of public/private hybrid cloud environments and customers that want to maximize application performance by shifting workloads between multiple physical data center locations and systems.

“Organizations come to us because we provide the highest degree of transparency, accountability and visibility into infrastructure and applications. This report, along with recent recognition by Gartner, the 451 Group and others, proves that we are a force to be reckoned with in the systems management market and a viable alternative to traditional ‘framework’ solutions,” said Alex Bewley, Chief Technology Officer of uptime software. “Being shortlisted alongside an impressive list of industry leaders is a testament to the value of our up.time solution and validates our company vision of helping any organization to better manage its critical resources.”

According to IDC market data as of July 31, 2009, the worldwide performance and availability management software market achieved total combined software license and maintenance revenue of $5 billion in 2008 -  a solid 10.4 percent increase over 2007 amid an ongoing global recession.

IDC believes that increased operational complexity fueled by the use of SOA and virtualization to support production application environments is driving more and more organizations to invest in sophisticated infrastructure and application performance monitoring, capacity planning, performance simulation and automated analytic tools. Spending on these tools is justified by a combination of operational and staff cuts savings paired with improved end user productivity and customer satisfaction, thanks to faster problem resolution and increased mission critical application availability. up.time helps make this transition easy and cost effective.

The IDC’s market analysis report is called the “Worldwide Performance and Availability Management 2009″ and it’s available through the firm’s Web site.

For more information about uptime software, please visit www.uptimesoftware.com

Alex

Incident Priority Tracking with up.time 5.2

Wednesday, August 12th, 2009

With all of the summer vacations happening it’s been a while since my last post but to get back in the swing of things I really wanted to talk about a feature from our newest release of up.time, version 5.2. The “Incident Priority Quadrant” report has gotten a lot of buzz from both existing customers, prospects and the press. Specifically I have one large financial customer who upgraded to up.time 5.2 and started running the “Incident Priority Quadrant” report on a weekly basis for their Tier 1 & 2 Applications. They are now easily able to see where they need to concentrate their very “limited” resources. As well they have been looking at the areas where they can setup some automation. This brings me to another topic – VMware’s new “vCenter Orchestrator™”. up.time 5.2 has direct integration with this drag and drop Automation and Orchestration tool… but I will save all that for my next post.

Incident Priority Quadrant Report

[/caption]

Cloud computing and popular culture

Friday, June 26th, 2009

This has been one hell of a week for the entertainment industry.  Ed McMahon, Farrah Fawcett and Michael Jackson have all passed away.  Whenever significant cultural events like this occur there is an explosion in communication amongst people, wanting to know what happened and further discuss it amongst their peers.  In the past this would have been isolated to talking with your neighbours, family and friends either in person or over a traditional POTS line.  Fast forward to the 21st century and we now have real time bidirectional communication between virtually anyone anywhere in the world. 

When you have an unpredictable event like the death of a societal icon or the launch of a new service that has the potential for extremely rapid adoption or at the very least high traffic due to curiousity alone, it is very difficult, or practically impossible to anticipate the real world resources needed to support the inbound demand.  This is very clearly shown by the chart from Keynote Systems illustrating the availability and performance impact of this event on news websites.

news-site-index-470

Image from: http://www.datacenterknowledge.com/archives/2009/06/25/michael-jackson-news-slows-web-sites/

TMZ.com was the first news outlet to break the story of Michael Jackson’s death, and consequently their site collapsed outright from the unexpected workload.  It’s hard to fault the IT team responsible for the services delivery, after all no one knew MJ was going to pass away yesterday, and arguably there is no one in entertainment today that would have generated the level of interest from the public as him.  So where am I going with all of this, to the clouds!  If there was ever a real world example of where a cloud solution would have played nicely into the delivery of a service that can be impacted by transient high-intensity workloads that can come without warning, this is it.  Even a properly architected high volume application or service that is designed to handle large increases in transient load has a finite capacity.  If TMZ.com had the ability to automatically spin up cloud resources and shunt the new traffic load over to them during the media frenzy, ideally they would have been able to stay up during the peak of the traffic and provided service quality and performance as good as their normal service levels.  (For the shunting, I’m a big fan of f5 gear for ADN networking)  Now, they could have done this manually I suppose, when they see the traffic coming they could have provisioned some AWS instances, got their site/content up and running and started routing traffic over through a change to their load balancers.  That’ll work, but it’s also manual, going to take them time to get it all implemented and by the time they’re done their end users have already hit a dead site and gone to one of their competitors.  So what to do?  Automate!

With the 5.2 release of up.time that was launched on Wednesday (June 24th, 2009) up.time now has a full bi-directional integration with VMware Orchestrator.  If you are a VMware shop, you get Orchestrator for free with vCenter Server.  If you are not familiar with Orchestrator, you can check it out here.  Essentially, Orchestrator is a policy based workflow automation tool that you can use to build automated scenarios to perform well pretty much anything.  Orchestrator has the concept of plugins that provide Orchestrator with the know how for specific vendor technologies to directly interact with them.  For example, the up.time plugin for Orchestrator lets you do things like add elements, create/modify/delete groups, service groups and other tasks from within Orchestrator.  (Under the hood, this is enabled by a new set of web services in up.time 5.2)  So how does this play into the TMZ.com cloud scenario, well it goes something like this.

  1. up.time is monitoring the end user experience for the website as seen by the logical service address using the HTTP service or WATM monitor.  (www.mynewssite.com)
    1. You can monitor the logical service for overall end user experience.
    2. You can monitor the individual web servers to identify if any given server is being overloaded to determine if that is expected behaviour or an issue like a load balancer algorithm misconfiguration.
    3. You can configure whatever service monitors you need (database, business logic, logs, etc) to determine the ongoing health of the service you are delivering and use that to trigger the automated resolution.
  2. When your end user begins to suffer or servers start to indicate they are becoming overloaded, have up.time trigger an Orchestrator workflow to automatically avoid any end user incident that may occur due to insufficient resources.  That would look something like this
    1. Using an action profile within up.time, trigger the Orchestrator workflow you have defined for automatically shunting workload to the cloud or to scale out internally onto idle resources.  The how you resolve it from a capacity perspective is really up to you.  You could have different capacity scale out workflows depending on where the performance bottleneck is.  If your webservers are overloaded, shunt to the cloud, if your database is overloaded, add a new node to your cluster.  In this scenario let’s scale out our web tier.
    2. up.time tells Orchestrator to trigger the ‘mywebsite cloud scaleout’ workflow, Orchestrator then manages the following
      1. Provision and configure an AWS server (or many if you need to) with the appropriate OS and web content.
      2. Add the new AWS instanes into up.time (via the up.time Orchestrator plugin, it’s downloadable from our site)
        1. Add the instances to the appropriate up.time groups
        2. Add the instances to the appropriate up.time service groups so the new services are monitored and managed
      3. Update the load balancer virtual IP pools to include the new AWS instances and begin sending traffic
    3. We’re now sending traffic to our AWS cloud without anyone ever having had to do anything other than the initial Orchestrator configuration.

I realize that technically the Orchestrator piece is not a 3 click and you’re in Nirvana exercise, however once it is implemented you’ll be able to have your web properties auto scale based on inbound workload before there is ever a problem.  Take it a step further and you can have up.time via Orchestrator deprovision the AWS resources when your site workload drops back to normal levels so you can close off the loop on provision-deprovision and only pay for the AWS resources you use when you need them.  Pretty cool eh?  I think so.  So with a little up front configuration in Orchestrator and up.time you can implement Automated Incident Avoidance and keep your services running when they are faced with the potential of unforseen transient workloads.  With up.time and Orchestrator, this is only one example out of literally hundreds (dare I say thousands) of ways you can automate your infrastructure management to ensure you are operating at the highest possible levels of efficiency both from a technology and a resource standpoint.

Virtual Appliances

Wednesday, June 24th, 2009

I love these things!

VMware’s Virtual Appliance Marketplace (VAM) is like a candy store for we geeks and nerds.  While not quite as robust as say, the iPhone App store, there are hundreds of ready made appliances for hundreds of applications. Pick your solution, download it and run it on your favorite VMware virtualization platform.  Don’t like it?  Simply delete it, nothing to ‘uninstall’.

For those of you who don’t know, a Virtual Appliance is the modern day equivalent of a turn-key application.  The OS, application and any supporting tools are pre-installed and ready to power up.  They save you gobs of time, especially when evaluating a solution. The best part?  Batteries are included, and some assembly is NOT required!  In most cases you don’t need to provision any new virtual hardware, or ask your storage manager for more space on the SAN.  Don’t have a virtualization platform yet? You can download VMware Player, for free, and run the appliance on your desktop.

I know that virtual appliances aren’t that new. they’ve been around for a while now.  I know, “way to be late to the game Mitchell!”, But it’s only recently that VMware has been pushing awareness through their VAM portal, and I’m particularly excited today.  Why?

up.time 5.2 has been appliancized!

the up.time Virtual Appliance is finally here and is a dream come true. Now instead of downloading up.time, making sure you meet all the system requirements, possibly installing a new OS and spending time simply readying yourself for our lighting-fast install, we’ve done it all for you!  Download the appliance and run it. It’s that simple.  Seriously,I fired up the appliance and was ready to play with up.time in about 3 minutes! (excluding download time).

This is a game changer.  No longer are you tied to a platform, or hardware. You can can truly be up and monitoring in minutes. Don’t like it? Go ahead and delete it.  Love it? Move it to a production virtualization platform, like ESX and run with it.

We’re confident that you’ll love it.

Download up.time virtual appliance today and try it free for 30 days.  We’d love to hear what you think (comments please!).

Stop the insanity

Monday, June 22nd, 2009

One of the definitions of insanity is doing the same thing over and over and expecting different results.  In this case, why does this affliction continue to haunt us in IT?  Given the significant advances in technology, specifically in virtualization, all of which are supposed to make our lives easier and more efficient; why haven’t we whipped the beast of IT complexity?  The majority of IT environments are still stuck in a world of break-fix, albeit, perhaps we’re just getting into a faster break-fix mode.

In one recent Gartner article, “Server Virtualization for x86: A Benefits Impact Assessment,” there is a rather telling statement:

“From Gartner surveys and client interactions, we know that, operationally, virtualization appears to be a “wash,” at best — and it actually creates additional costs (people, process development and tools) on a worst-case basis. “

So what are we doing wrong?  One reason is that in daily operations, there isn’t an easy way to prioritize incoming incidents or determine recurring problems.  I would categorize the recurring outages as “death by a thousand cuts.”  This is further exacerbated on teams with a number of sysadmins, where the same problem can be perceived as distinct problems to each sysadmin.  Resource inefficiency is created by having multiple sysadmins solve the same problem over-and-over.

Additionally, in VMware environments, the old traditional metrics of guest CPU, Memory, and I/O are not very useful anymore.  They aren’t good indicators of how guests ‘get along’ during regular compute workloads.  There are a whole new series of VMware specific metrics that are indicators of VM guest contention from a compute, bandwidth, and memory usage point of view.  System’s management tooling needs to understand these new factors to aid in managing a virtual infrastructure, something that traditional ‘Big 4′ tooling just can’t do.  Putting ill-matching VM guests onto the same physical infrastructure is simply asking for incidents and accumulated outages over time.

The right approach should be to stop banging your head against the wall, rather than simply taking two aspirin every day and dealing with the pain.  Instead of waiting for incidents to occur, a more proactive manner of avoiding them should be possible.   With VMware’s launch of vSphere in May a package called Orchestrator is also bundled (this is from their Dunes acquisition of a few years ago).  This is fantastic news for SMBs (and enterprises too) as it means that any installation of VMware vSphere will have runbook automation capabilities.  VMware’s Orchestrator is a very simple drag and drop interface to create (potentially complex) workflows to control your virtual infrastructure.  The latest release of up.time integrates tightly with Orchestrator to add application-level monitoring  and management capabilities and can trigger specific workflows when certain applications are about to exceed SLA objectives or will degrade unless corrective action is taken.   Through an Orchestrator plug-in the up.time API is also exposed, allowing bi-directional communication between Orchestrator and up.time (so you can dynamically add systems, or re-group them on the fly).

So rather than wait for an application to fail and trigger an incident, up.time can take corrective action in advance to complete avoid the incident.  This starts to help us snap out of the break-fix routine that we’re all stuck in.  Let’s take an example of incident avoidance and dynamic infrastructure:

Let’s say that you have an e-commerce application that requires that certain response time thresholds can’t be exceeded and the concurrent user sessions are also a factor.  With up.time, since it’s already a micro-framework that can monitor your entire infrastructure (applications, databases, platforms, networks, etc.), it can trigger actions based on identified thresholds.  If user sessions starts to peak or response time begins to drop, up.time can trigger an Orchestrator workflow to dynamically provision additional VM guests and bring them online into the e-commerce application.   Also, since up.time understands the application, as workload drops over time (e.g. the user peak has dissipated), workflows can then be triggered to de-provision the extra VM guests to avoid sprawl.

There are many more exciting things in this release, but we’ll cover those in another blog post.  I’m also going to cover the exciting capability of bridging private and public cloud with up.time.  What about dynamically provisioning compute capability in Terremark’s cloud or Amazon’s EC2 from the privacy of your own infrastructure and then having these instances monitored under the global purview of up.time?  We can do this, more info next blog post.