The Big Picture in IT Systems Management

Monday, March 17, 2008

VM Sprawl

Even though our product monitors VMware environments, we ourselves are big users of VMware internally for our QA and Development environments. This is great for us, as we use our tool internally to manage the QA and Dev systems. In fact, is was our early adoption of VMware a few years ago that lead to the extension of up.time to monitor and manage VM environments.

Our biggest problems involved:
  • losing VMs in the physical environment (see the photo [next], and this is just a small snapshot of the QA environment)
  • temporary creation of VMs for QA testing and then forgetting to decommission them (thus consuming memory and disk storage [cpu not so much])
  • finding the right configuration of a VM image to create a QA environment (e.g. Oracle 10g on RedHat with WebLogic 9.x, etc.)
  • running out of space on the SAN pools b/c of storage sprawl
  • departments requesting additional physical hardware b/c we couldn't internally map out our resource usage across the VM physical systems.
  • occasional load testing killing a physical system (b/c caps weren't put on)
  • trying to identify dependencies between VMs for particular tests or development runs (e.g. system X has Oracle, system Y has WebSphere, system Z has up.time, etc.).
So, as you can see, there were a number of issues related to virtualization that we needed to address. Since up.time was already a great tool for collecting and analyzing performance data over time, we extended the tool to talk directly to VMware ESX (and Virtual Center) to extract physical and virtual configurations, detailed performance data, and Vmotion information. Over time, we were able to graph VM instance performance information not just within a physical VMware system, but across an entire VMware farm. We could identify which VMs were being migrated throughout the farm and how much compute, memory, and storage was being consumed.

Additionally, because up.time is an active server monitoring tool, we were able to create monitors that triggers alerts when new virtual instances were provisioned. This way, if up.time didn't already know about an existing instance (it automatically inventories instances across a VMware farm), an alert would be generated.

There are a number of nice tools that already exist for provisioning management of instances (VMware Dunes, http://www.dunes.ch, and VMware Lab Manager), however, these tools don't actively monitor and profile a live VMware environment. This is where up.time excels.

We are continuing to develop VMware functionality within VMware and brining the traditional up.time "easy-of-use" mantra to VMware sprawl management.



0 Comments:

Post a Comment

<< Home