The up.time IT Systems Management Blog

Posts Tagged ‘VMware’

up.time 6 – Get Your Sneak Peek at our New Baby!

Thursday, October 6th, 2011

The up.time 6 launch is just around the corner (end of the month)! So, we wanted to give you a sneak peek of what to expect. This new release is all about one thing; helping you monitor and manage your VMware environment better. Our development team has worked hard make VMware monitoring and management as easy as possible for IT departments, because we know you don’t have a lot of time on your hands.

In addition to the new VMware monitoring and reporting capabilities, up.time 6 continues to deeply monitor across all datacenter infrastructure and applications to give you the most complete set of metrics on performance, availability, and capacity. You’ll have full control over your servers and services across Windows, UNIX (IBM AIX, Sun Solaris, HP), Linux, VMware, Novell, and more from a single dashboard.

Here’s a quick preview into two of the major additions that will be included in up.time 6:

SMART Monitoring

1. Smart VMware Monitoring. We’ve taken a “Set it and forget it” approach with these new functionalities, allowing IT guys to save some of their time on virtualization monitoring. This includes:

  • Real-time vSync: To ensure your monitoring is fluid with your VMware environment. This will allow you to immediately know when VMs are added or changed with monitoring and alerting that’s automatically applied.
  • Sprawl Control: Be alerted and take automated action on new VMs, including license validation, resource allocation, and security compliance.
  • VM Power Awareness: Monitor power usage in your VMware environment to track energy savings initiatives, isolate power gobbling applications and workloads, and to map power usage to capacity over time.

 

up.time 6 Capacity Planning2. Comprehensive VMware Capacity Planning. We know that the #1 driver of your VMware performance is Capacity, so we’ve been working hard to make VM capacity planning easier. A huge problem across the systems management space is that IT professionals need to know how much capacity they have in their VMware environment, how much they’re currently using, and where the capacity bottlenecks are – and most importantly, when they’re going to run out so that they can prepare for future requirements.  Up.time 6 has addressed these problems by adding:

  • Capacity Bottleneck Trouble-Shooting, which will alert the minute a capacity bottleneck appears so you can regain control by locating them, in minutes, with deep, easy to use (3-click) capacity metrics.
  • Global VMware Capacity Reports, which will allow you to easily see and compare historical capacity trends across VMware (clusters, resource pools, vApps, VMs, ESX hosts, vCenters, Datacenters, and more) to help establish baselines for upgrade or consolidation projects and ensure that you never overspend on, or run out of, capacity again.
  • Virtual Capacity Forecasting. Stop getting caught begging for additional capacity and storage. Our new forecasting will allow you to easily and accurately see when virtual capacity will run out long before it pops up and bites you. It will find the users and business units who are over allocating or inefficiently using storage so that you can address their work practices.

I’m really excited for this release and the functionality that it brings to the table, and I think our clients will be more than pleased with what’s to come!

If you’re looking for a closer look at what I’ve talked about and/or want to see some of this functionality in action, our Product Manager is hosting a “Sneak Peek” webinar today (Oct. 6th) at 4pm EST that I recommend you attend. To register, click here.

- Alex

2010 – The Year of Cloud Experimentation – Part 1 of 2

Monday, November 30th, 2009

At uptime software, we’ve been quite bullish on Cloud’s potential but feel it still has some distance to cover before it lives up to the hype. In fact, I wrote a blog in January looking at a hypothetical company and the costs involved in moving an entire infrastructure into the Cloud (using Amazon EC2). The results were not impressive, Cloud computing was too expensive (in this example) to gain the critical mass it needs to catch on. It’s amazing how much had changed in the ten months since that blog, as we have learned more about how the Cloud can be best utilized. Recently, the media has driven the Cloud excitement and IT managers are now thinking about how the Cloud, in one form or another, can be used in their environments to drive performance and efficiencies.

The real question is this; in what capacity will organizations adopt Cloud over the next few years? With that in mind, we see the coming year as one of exploration and experimentation. The first step is for companies to quantify what Cloud means to their business.  Is it as banal as remote storage used for DR purposes, or something as evolved as dynamic compute with secure private/public networking?

Let’s take a look at the “IT Spectrum,” which is loosely aligned with IT maturity and size of organization.

In this diagram, the left represents most small businesses who house their own servers and have a small number of IT staff.  As the small business matures, they may evaluate SaaS-type applications (like Salesforce.com) or push some servers out to an MSP.  Further maturing, or growing, businesses may have additional servers in remote hosted datacenters, like web servers or remote disaster recovery storage.  At the right-most point in the spectrum, businesses/enterprises have opted to completely outsource their IT and minimize the number of IT staff employed by the business.

Understanding the spectrum’s components is important. They represent a “menu” of options that businesses can use to leverage virtualization and cloud technologies to reduce costs (either labor or infrastructure).  This “menu” is most likely how IT managers will choose to evaluate the relevance of Cloud to cost savings and enhanced service delivery.  For example, with VMware’s new VBlock offering and the ongoing relationship with Terremark, entire stacks of infrastructure can be pushed into off-premises locations and operated in a mission-critical environment. So, whether it’s just dipping a toe into the Cloud waters (like hosting a server in Amazon EC2 or the RackSpace Cloud to deliver a decoupled application) or leveraging the VBlock to move entire mission critical infrastructures, there are many options to consider. Keep in mind that issues such as backup management, lifecycle management, and systems management need to be addressed in all cases.

How is the experimentation starting?

[ more next week in Part 2 ]

Just how disruptive is Cloud technology?

Monday, November 9th, 2009

Let’s understand for a moment just how disruptive Cloud and virtualization technologies are to OTHER technologies. Ignore for a moment, all the changes required to business processes, maintenance processes, infrastructure deployment models and all the other stuff people have been beating to death over the past 2 months.

Just how pervasive and challenging is Cloud technology to entrenched technology? Well for one, people are redesigning and re-thinking how we use TCP/IP in order to enable and Long Distance VMotion. That’s right, in order to be able to forklift virtual instances and massive data over the internet, companies like netex have figured out how to make the old building block of the interwebs TCP/IP even better – dubbing their new UDP over IP translation technology “HyperIP”.  HyperIP optimizes TCP/IP so that you can move a full vmware instance over the wire up to 10X faster than usual. (Let’s not even talk about how people will monitor this new disruptive technology, but you can bet it’s the agile players who are even aware of the new challenges in this space).

The potential for this technology is 100% clear, and probably is somewhere in a lab being coveted by the people at VMWare as “my precious” – especially in the context of their desire to get remote DRS as a solidified feature in the VSPHERE platform.   If VMware manages to get this integrated as part of remote DRS and they start forklifting instances to/from and across the Savvis and Terremark clouds this will be a giant leap towards making unified compute and private/public clouds – “as real as it gets”. This doesn’t even take into account the latest ‘turnkey’ private cloud solutions unveiled by VMWare known as VBlocks.

The clouds just zapped TCP/IP, what’s next?

Incident Priority Tracking with up.time 5.2

Wednesday, August 12th, 2009

With all of the summer vacations happening it’s been a while since my last post but to get back in the swing of things I really wanted to talk about a feature from our newest release of up.time, version 5.2. The “Incident Priority Quadrant” report has gotten a lot of buzz from both existing customers, prospects and the press. Specifically I have one large financial customer who upgraded to up.time 5.2 and started running the “Incident Priority Quadrant” report on a weekly basis for their Tier 1 & 2 Applications. They are now easily able to see where they need to concentrate their very “limited” resources. As well they have been looking at the areas where they can setup some automation. This brings me to another topic – VMware’s new “vCenter Orchestrator™”. up.time 5.2 has direct integration with this drag and drop Automation and Orchestration tool… but I will save all that for my next post.

Incident Priority Quadrant Report

[/caption]

SpringSource and VMware

Wednesday, August 12th, 2009

As most of you know already, VMware has acquired SpringSource for a quite remarkable $420MM. Along with this purchase comes Hyperic, a struggling open source system’s management vendor that was recently force-merged with SpringSource by communal VCs.
Ultimately, this acquisition sets the stage for development and deployment on cloud computing platforms (PaaS), however, our interest lies in the monitoring, measurement, and management of applications running in the cloud. This is an area in which Hyperic conceivably will be used, however, they will need lots of development effort to enhance their cloud offering (Amazon EC2 API calls to instantiate AMI’s isn’t really what I would call ‘cloud leadership’, or ‘cool’).
I am also curious as to how enterprise customers are going to deal with having open source software managing their environments (there still are a huge number of holdouts in this area, which is why Hyperic was struggling).
This acquisition, in the next 12-18 months, doesn’t help VMware compete against Microsoft’s SCOM in heterogeneous environments (physical, virtual, and multiplatform) – which, in my opinion, poses a greater risk to enterprise adoption.

Alex

Cloud computing and popular culture

Friday, June 26th, 2009

This has been one hell of a week for the entertainment industry.  Ed McMahon, Farrah Fawcett and Michael Jackson have all passed away.  Whenever significant cultural events like this occur there is an explosion in communication amongst people, wanting to know what happened and further discuss it amongst their peers.  In the past this would have been isolated to talking with your neighbours, family and friends either in person or over a traditional POTS line.  Fast forward to the 21st century and we now have real time bidirectional communication between virtually anyone anywhere in the world. 

When you have an unpredictable event like the death of a societal icon or the launch of a new service that has the potential for extremely rapid adoption or at the very least high traffic due to curiousity alone, it is very difficult, or practically impossible to anticipate the real world resources needed to support the inbound demand.  This is very clearly shown by the chart from Keynote Systems illustrating the availability and performance impact of this event on news websites.

news-site-index-470

Image from: http://www.datacenterknowledge.com/archives/2009/06/25/michael-jackson-news-slows-web-sites/

TMZ.com was the first news outlet to break the story of Michael Jackson’s death, and consequently their site collapsed outright from the unexpected workload.  It’s hard to fault the IT team responsible for the services delivery, after all no one knew MJ was going to pass away yesterday, and arguably there is no one in entertainment today that would have generated the level of interest from the public as him.  So where am I going with all of this, to the clouds!  If there was ever a real world example of where a cloud solution would have played nicely into the delivery of a service that can be impacted by transient high-intensity workloads that can come without warning, this is it.  Even a properly architected high volume application or service that is designed to handle large increases in transient load has a finite capacity.  If TMZ.com had the ability to automatically spin up cloud resources and shunt the new traffic load over to them during the media frenzy, ideally they would have been able to stay up during the peak of the traffic and provided service quality and performance as good as their normal service levels.  (For the shunting, I’m a big fan of f5 gear for ADN networking)  Now, they could have done this manually I suppose, when they see the traffic coming they could have provisioned some AWS instances, got their site/content up and running and started routing traffic over through a change to their load balancers.  That’ll work, but it’s also manual, going to take them time to get it all implemented and by the time they’re done their end users have already hit a dead site and gone to one of their competitors.  So what to do?  Automate!

With the 5.2 release of up.time that was launched on Wednesday (June 24th, 2009) up.time now has a full bi-directional integration with VMware Orchestrator.  If you are a VMware shop, you get Orchestrator for free with vCenter Server.  If you are not familiar with Orchestrator, you can check it out here.  Essentially, Orchestrator is a policy based workflow automation tool that you can use to build automated scenarios to perform well pretty much anything.  Orchestrator has the concept of plugins that provide Orchestrator with the know how for specific vendor technologies to directly interact with them.  For example, the up.time plugin for Orchestrator lets you do things like add elements, create/modify/delete groups, service groups and other tasks from within Orchestrator.  (Under the hood, this is enabled by a new set of web services in up.time 5.2)  So how does this play into the TMZ.com cloud scenario, well it goes something like this.

  1. up.time is monitoring the end user experience for the website as seen by the logical service address using the HTTP service or WATM monitor.  (www.mynewssite.com)
    1. You can monitor the logical service for overall end user experience.
    2. You can monitor the individual web servers to identify if any given server is being overloaded to determine if that is expected behaviour or an issue like a load balancer algorithm misconfiguration.
    3. You can configure whatever service monitors you need (database, business logic, logs, etc) to determine the ongoing health of the service you are delivering and use that to trigger the automated resolution.
  2. When your end user begins to suffer or servers start to indicate they are becoming overloaded, have up.time trigger an Orchestrator workflow to automatically avoid any end user incident that may occur due to insufficient resources.  That would look something like this
    1. Using an action profile within up.time, trigger the Orchestrator workflow you have defined for automatically shunting workload to the cloud or to scale out internally onto idle resources.  The how you resolve it from a capacity perspective is really up to you.  You could have different capacity scale out workflows depending on where the performance bottleneck is.  If your webservers are overloaded, shunt to the cloud, if your database is overloaded, add a new node to your cluster.  In this scenario let’s scale out our web tier.
    2. up.time tells Orchestrator to trigger the ‘mywebsite cloud scaleout’ workflow, Orchestrator then manages the following
      1. Provision and configure an AWS server (or many if you need to) with the appropriate OS and web content.
      2. Add the new AWS instanes into up.time (via the up.time Orchestrator plugin, it’s downloadable from our site)
        1. Add the instances to the appropriate up.time groups
        2. Add the instances to the appropriate up.time service groups so the new services are monitored and managed
      3. Update the load balancer virtual IP pools to include the new AWS instances and begin sending traffic
    3. We’re now sending traffic to our AWS cloud without anyone ever having had to do anything other than the initial Orchestrator configuration.

I realize that technically the Orchestrator piece is not a 3 click and you’re in Nirvana exercise, however once it is implemented you’ll be able to have your web properties auto scale based on inbound workload before there is ever a problem.  Take it a step further and you can have up.time via Orchestrator deprovision the AWS resources when your site workload drops back to normal levels so you can close off the loop on provision-deprovision and only pay for the AWS resources you use when you need them.  Pretty cool eh?  I think so.  So with a little up front configuration in Orchestrator and up.time you can implement Automated Incident Avoidance and keep your services running when they are faced with the potential of unforseen transient workloads.  With up.time and Orchestrator, this is only one example out of literally hundreds (dare I say thousands) of ways you can automate your infrastructure management to ensure you are operating at the highest possible levels of efficiency both from a technology and a resource standpoint.

Virtual Appliances

Wednesday, June 24th, 2009

I love these things!

VMware’s Virtual Appliance Marketplace (VAM) is like a candy store for we geeks and nerds.  While not quite as robust as say, the iPhone App store, there are hundreds of ready made appliances for hundreds of applications. Pick your solution, download it and run it on your favorite VMware virtualization platform.  Don’t like it?  Simply delete it, nothing to ‘uninstall’.

For those of you who don’t know, a Virtual Appliance is the modern day equivalent of a turn-key application.  The OS, application and any supporting tools are pre-installed and ready to power up.  They save you gobs of time, especially when evaluating a solution. The best part?  Batteries are included, and some assembly is NOT required!  In most cases you don’t need to provision any new virtual hardware, or ask your storage manager for more space on the SAN.  Don’t have a virtualization platform yet? You can download VMware Player, for free, and run the appliance on your desktop.

I know that virtual appliances aren’t that new. they’ve been around for a while now.  I know, “way to be late to the game Mitchell!”, But it’s only recently that VMware has been pushing awareness through their VAM portal, and I’m particularly excited today.  Why?

up.time 5.2 has been appliancized!

the up.time Virtual Appliance is finally here and is a dream come true. Now instead of downloading up.time, making sure you meet all the system requirements, possibly installing a new OS and spending time simply readying yourself for our lighting-fast install, we’ve done it all for you!  Download the appliance and run it. It’s that simple.  Seriously,I fired up the appliance and was ready to play with up.time in about 3 minutes! (excluding download time).

This is a game changer.  No longer are you tied to a platform, or hardware. You can can truly be up and monitoring in minutes. Don’t like it? Go ahead and delete it.  Love it? Move it to a production virtualization platform, like ESX and run with it.

We’re confident that you’ll love it.

Download up.time virtual appliance today and try it free for 30 days.  We’d love to hear what you think (comments please!).

Do more with less. Virtualize and save – but plan carefully!

Wednesday, June 10th, 2009

Here’s some more work for you. Here’s some more responsibility. Here’s a shorter deadline. Now do it all with less money, less time, less resources, less, less, less!

It seems as though the more efficient we become, the more constrained we are. The current economic climate doesn’t help either.   Yes, this is the new norm.  So what can you do?

If you’re reading this you’ve probably invested time and money into a virtual infrastructure, or are considering it.  Great!  Virtualized computing environments squeeze every last drop of performance from hardware and, when properly budgeted, can save you thousands in the long run. But don’t expect a free lunch.

Physical to Virtual Consolidation

Consolidation of physical servers to virtual hosts allows you to break the 1 application to 1 server mold.  However the increased density in your server room might create hot spots, especially if you’ve decided on using a blade chassis.

That increased density means you’ll also be pushing your hardware harder. This will likely increase your power requirements, slightly.  Newer hardware is indeed more efficient, and technologies like VMware’s DRS Distributed Power Management allow you to move workloads around to less stressed hosts and power off unused resources. The net effect is a possible overall reduction in power usage, but peak times could actually require more.

An Up Front Expense?

Virtualization is a net new expense. Unless you are starting from scratch, you will need to invest in hardware, and software licensing.  I was recently asked to vet the cost of a 24 host, enterprise level virtual environment.  Assuming a requirement of 10 Terabytes of storage, and going with mid tier hardware I came up with an up-front ballpark cost of USD $225,000.  No small change.  Amortize your projected savings carefully. Is it worth the up-front investment? Luckily you can grow your virtual environment easily as required with little to no negative impact on the existing services.

Implement Standards

Virtualization has made provisioning services a snap. You’ve heard all the marketing buzz — reduced time to market, provision servers in seconds!, etc.  Suddenly that 10T of storage is GONE.  But how?

Sprawl. (Yes, up.time can help you with this!)

Back in the days before virtualization, if you needed more resources you had to justify the expense nine ways from Sunday.  When it finally arrived you’d spend a week staging it.  Then testing and finally implementing it, only to have it completely consumed a few months later!  When you planned your virtual infrastructure you WAY over provisioned it, didn’t you?  You thought ahead 3 years like you  did when you bought a single server for that one application.  However now you’re planning for possibly hundreds of workloads.  Need another machine?  No problem, just clone it and wait a few minutes.   Ever have cash burn a hole in your pocket?  Budgets prevent us from blowing that spare cash.  It’s exactly the same in a virtual environment, except the spare cash is extra CPU cycles, storage and memory.  From simply devising a set of rules for managing the virtual machine life cycle, or implementing tools to manage it, the only way you will realize long-term savings is to ensure you’re only using what you need.  Don’t run your VM environment like that TV salesperson’s famous oven — “Set it, and forget it!”.

If you keep these things in mind when building and managing your VMware vSphere environment, or any other virtual infrastructure, you will absolutely be able to do more, with less. Of course <shamesless plug>, up.time can solve you VMware monitoring needs with it’s deep VMware monitor and reports.

VMware vSphere – Are you ready? We are!

Wednesday, June 3rd, 2009

Unless you live under a rock, you know that VMware recently released vSphere 4.  The highly anticipated upgrade to its virtual infrastructure suite.  The number of feature upgrades and enhancements makes the new version somewhat hard to ignore.  But if you’re like me you tend to shy away from .0 releases.  I usually wait for the real world installations to sort out the bugs and let the developer issue a patch or point release. Let someone else be my guinea pig.  The last thing you want is for an upgrade to nuke your production system.

I am, however, happy to report that our experience with vSphere 4 has been relatively smooth so far.  While I’ve not taken the plunge and upgraded our production environment yet, our lab upgrade from 3.5 to the 4.0 beta, and subsequently the general release went off without a hitch.  This gives me the confidence to at least begin the planning stages of the production system upgrade.

Step one is to make sure our existing systems are at the latest version of Infrastructure 3.5 and fully patched. We start that in a a week or so and I’ll keep you all abreast of the progress.  One thing I don’t have to worry about as we ready our production environment for vSphere is that the up.time monitoring station is waiting for us on the other side.  It’s just waiting for me to play catch up!

So, have you upgraded to vSphere yet?  Tell us about your experience with the process and about vSphere in general. Or even better, if you are monitoring your vSphere infrastructure with up.time we’d love to hear about your experience. You can visit the up.time website for more on vSphere Monitoring or VMware monitors.

VMware VMotion & DRS… Problem Solved

Wednesday, May 27th, 2009

I have been working with a large financial institution for the past few months and on Monday, they used one of the many Virtualization reports available in up.time 5 to help solve an issue they were having with one of their VMware ESX clusters. I have been using this report quite a bit but wanted to highlight it on the blog. It’s called the VMware Instance Motion Report and tracks instances (Virtual Machines) and when they VMotion (move) around an ESX cluster. Either manually or by such methods as DRS (Distributed Resource Scheduler).

For the financial customer I was referring to, they had recently setup DRS on a new cluster. For those of you not familiar with DRS, it basically dynamically allocates resources to enforce resource management policies while harmonizing resource usage across multiple ESX hosts. One of the options when setting up DRS is how aggressive you want to be (this is set across the entire cluster) and has five different options which range from Level 1 (Conservative) to level 5 (Aggressive). The person who had setup DRS on this new cluster had it set to level 5 which was causing constant VMotioning between hosts. We were able to immediately see this in the instance motion report, which tracks individual Virtual Machines across multiple ESX hosts.

Problem Solved.

VMware Instance Motion Report

[/caption]