The Big Picture in IT Systems Management

Thursday, June 26, 2008

Interesting customers

On a regular basis, I try and talk to customers about up.time and what their likes and dislikes are. I recently had the fortune to talk to Ben Rockwood, of Cuddletech blogging fame. Ben is about as hard-core Solaris as they come, and if you read his blog you'll garner lots of interesting snippets about everything Solaris related. As it turns out, Ben is also working at Joyent, which is a customer of ours and a large Sun user. Specifically, they use a lot of Sun with Solaris containers, something which plays well into our hetero-virtualization strength.

We had an interesting conversation, and it took some very interesting zigs and zags; ranging from product features, to agent vs. agentless monitoring, to how desperately he tried to get rid of up.time when he first started at Joyent.

First, how about the getting rid of us story... When Ben initially joined Joyent he couldn't believe that they had bought a commercial tool for performance and availability management, why weren't they using open source? As it turns out, every time that he needed forensic data to understand why outages had occurred, or why performance was suffering, he discovered that up.time had already recorded all the necessary low level performance stats required. There wasn't any need to go back and script the necessary commands, the data was already in up.time's performance datawarehouse. He finally conceded that up.time knew what you wanted, even before you knew you needed it. So, we got to stay!

We had further discussions on other system's management vendors and their techniques used for gathering data. Both he and I concurred that in order to get the necessary low-level metrics for both planning and forensic problem diagnosis you need agents on the systems monitored. He is the first guy that I've talked to that also agrees that Net-SNMP is not an agentless solution!

Ben had a legitimate critique of our service monitor extensibility, as our XML document definition for defining the data is a little cumbersome (and yes, we're working on it). However, he really liked the fact that our extensibility supports the concept of Arrays of data (or ranged data, as well call in internally). Almost all tools (open source and commercial) are extensible but only for atomic data (e.g. one integer, one string, etc.). With up.time, you can define a service monitor that understands an Array of data. So, for example, let's say that you want a count of users per Solaris zone (and want to record this over time). You can do this in an Array, e.g.

Zone NumUsers
zone1 2
zone2 5
zone3 6

The up.time monitor will take in this data and then you can graph the data on a line chart with the actual zone names and meaningful titles.

We also talked about support for dtrace, especially since it's now being supported on platforms other than Solaris. Our current extensibility is the solid groundwork for supporting output from dtrace scripts. We're currently looking at supporting dtrace through our data gathering mechanisms and it's only going to add to the arsenal of being able to quickly diagnose performance problems.

In a parting comment Ben mentioned that "uptime software gets it." They came from a system's administrator/management background and have continued to make up.time usable to the very people that need to keep infrastructure running.

Now, the conversation wasn't a total love-fest, Ben did have some good critiques, and it's people like him that are helping set the bar higher for us.

It's great to have customers that like using your product but whom are always pushing you to do better.

Labels:




0 Comments:

Post a Comment

<< Home