<?xml version='1.0' encoding='UTF-8'?><rss xmlns:atom='http://www.w3.org/2005/Atom' xmlns:openSearch='http://a9.com/-/spec/opensearchrss/1.0/' version='2.0'><channel><atom:id>tag:blogger.com,1999:blog-4269386030357445158</atom:id><lastBuildDate>Thu, 26 Jun 2008 15:31:56 +0000</lastBuildDate><title>Alex Bewley: Systems Management</title><description/><link>http://www.uptimesoftware.com/blog/alex-bewley/</link><managingEditor>noreply@blogger.com (Uptime WebDev)</managingEditor><generator>Blogger</generator><openSearch:totalResults>10</openSearch:totalResults><openSearch:startIndex>1</openSearch:startIndex><openSearch:itemsPerPage>25</openSearch:itemsPerPage><item><guid isPermaLink='false'>tag:blogger.com,1999:blog-4269386030357445158.post-3159578864358923334</guid><pubDate>Thu, 26 Jun 2008 13:57:00 +0000</pubDate><atom:updated>2008-06-26T07:02:25.796-07:00</atom:updated><title>Lament for splunk</title><description>&lt;a href="http://www.splunk.com"&gt;Splunk&lt;/a&gt; is a partner of ours and up.time integrates with splunk to assist with forensic problem resolution.  Michael Baum, their CEO, just published this blog entry - "&lt;a href="http://blogs.splunk.com/thebaum/2008/06/25/ode-to-log-management/"&gt;Ode to Log Management.&lt;/a&gt;"&lt;br /&gt;&lt;br /&gt;I hope it's not the beginning of the end.</description><link>http://www.uptimesoftware.com/blog/alex-bewley/2008/06/lament-for-splunk.html</link><author>noreply@blogger.com (Alex Bewley)</author></item><item><guid isPermaLink='false'>tag:blogger.com,1999:blog-4269386030357445158.post-4305525169775968478</guid><pubDate>Thu, 26 Jun 2008 13:25:00 +0000</pubDate><atom:updated>2008-06-26T06:53:52.323-07:00</atom:updated><category domain='http://www.blogger.com/atom/ns#'>solaris zones containers joyent</category><title>Interesting customers</title><description>On a regular basis, I try and talk to customers about up.time and what their likes and dislikes are.  I recently had the fortune to talk to Ben Rockwood, of &lt;a href="http://www.cuddletech.com/blog/"&gt;Cuddletech&lt;/a&gt; blogging fame.   Ben is about as hard-core Solaris as they come, and if you read his blog you'll garner lots of interesting snippets about everything Solaris related.  As it turns out, Ben is also working at &lt;a href="http://www.joyent.com/"&gt;Joyent&lt;/a&gt;, which is a customer of ours and a large Sun user.  Specifically, they use a lot of Sun with Solaris containers, something which plays well into our hetero-virtualization strength.&lt;br /&gt;&lt;br /&gt;We had an interesting conversation, and it took some very interesting zigs and zags; ranging from product features, to agent vs. agentless monitoring, to how desperately he tried to get rid of up.time when he first started at Joyent.&lt;br /&gt;&lt;br /&gt;First, how about the getting rid of us story...  When Ben initially joined Joyent he couldn't believe that they had bought a commercial tool for performance and availability management, why weren't they using open source?  As it turns out, every time that he needed forensic data to understand why outages had occurred, or why performance was suffering, he discovered that up.time had already recorded all the necessary low level performance stats required.  There wasn't any need to go back and script the necessary commands, the data was already in up.time's performance datawarehouse.  He finally conceded that up.time knew what you wanted, even before you knew you needed it.  So, we got to stay!&lt;br /&gt;&lt;br /&gt;We had further discussions on other system's management vendors and their techniques used for gathering data.  Both he and I concurred that in order to get the necessary low-level metrics for both planning and forensic problem diagnosis you need agents on the systems monitored.  He is the first guy that I've talked to that also agrees that Net-SNMP is not an agentless solution! &lt;br /&gt;&lt;br /&gt;Ben had a legitimate critique of our service monitor extensibility, as our XML document definition for defining the data is a little cumbersome (and yes, we're working on it).  However, he really liked the fact that our extensibility supports the concept of Arrays of data (or ranged data, as well call in internally).  Almost all tools (open source and commercial) are extensible but only for atomic data (e.g. one integer, one string, etc.).  With up.time, you can define a service monitor that understands an Array of data.  So, for example, let's say that you want a count of users per Solaris zone (and want to record this over time).  You can do this in an Array, e.g.&lt;br /&gt;&lt;br /&gt;Zone NumUsers&lt;br /&gt;zone1 2&lt;br /&gt;zone2 5&lt;br /&gt;zone3 6&lt;br /&gt;&lt;br /&gt;The up.time monitor will take in this data and then you can graph the data on a line chart with the actual zone names and meaningful titles.&lt;br /&gt;&lt;br /&gt;We also talked about support for dtrace, especially since it's now being supported on platforms other than Solaris.  Our current extensibility is the solid groundwork for supporting output from dtrace scripts.  We're currently looking at supporting dtrace through our data gathering mechanisms and it's only going to add to the arsenal of being able to quickly diagnose performance problems.&lt;br /&gt;&lt;br /&gt;In a parting comment Ben mentioned that "uptime software gets it."  They came from a system's administrator/management background and have continued to make up.time usable to the very people that need to keep infrastructure running.&lt;br /&gt;&lt;br /&gt;Now, the conversation wasn't a total love-fest, Ben did have some good critiques, and it's people like him that are helping set the bar higher for us.&lt;br /&gt;&lt;br /&gt;It's great to have customers that like using your product but whom are always pushing you to do better.</description><link>http://www.uptimesoftware.com/blog/alex-bewley/2008/06/interesting-customers.html</link><author>noreply@blogger.com (Alex Bewley)</author></item><item><guid isPermaLink='false'>tag:blogger.com,1999:blog-4269386030357445158.post-7606314821904621871</guid><pubDate>Wed, 04 Jun 2008 17:38:00 +0000</pubDate><atom:updated>2008-06-05T07:43:20.785-07:00</atom:updated><title>Latest release</title><description>I'm quite pleased to announce that we've released our latest version of up.time.  This latest release includes a number of great new things including: extensive VMware, pSeries micropartion (AIX), and Solaris virtualization capabilities; a whole new Service Level Agreement (SLA) solution; and some nice end-user transaction monitoring capabilities.&lt;br /&gt;&lt;br /&gt;The product is more scalable than ever - we can now effectively monitor 5,000 systems in a single monitoring loop, and this will scale even further with our multi-data center release that is coming soon.  We've also cleaned up and simplified the user interface to make it even easier to navigate through the product to get the necessary data with as few clicks as possible.&lt;br /&gt;&lt;br /&gt;Check it out: &lt;a href="http://www.uptimesoftware.com/overview.php"&gt;http://www.uptimesoftware.com/overview.php&lt;/a&gt;</description><link>http://www.uptimesoftware.com/blog/alex-bewley/2008/06/latest-release.html</link><author>noreply@blogger.com (Alex Bewley)</author></item><item><guid isPermaLink='false'>tag:blogger.com,1999:blog-4269386030357445158.post-2375958615814019346</guid><pubDate>Tue, 27 May 2008 15:41:00 +0000</pubDate><atom:updated>2008-05-27T08:42:57.338-07:00</atom:updated><title>Data Center Power</title><description>Yet another interesting story on the huge power consumption of data centers, in the &lt;a href="http://www.economist.com/business/displayStory.cfm?story_id=11413148&amp;amp;fsrc=nwlehfree"&gt;Economist&lt;/a&gt;.  Microsoft is building a data center that consumes 198MW!</description><link>http://www.uptimesoftware.com/blog/alex-bewley/2008/05/data-center-power.html</link><author>noreply@blogger.com (Alex Bewley)</author></item><item><guid isPermaLink='false'>tag:blogger.com,1999:blog-4269386030357445158.post-4146167915739591488</guid><pubDate>Mon, 26 May 2008 19:31:00 +0000</pubDate><atom:updated>2008-05-26T12:35:13.731-07:00</atom:updated><title>Inflation Visualization</title><description>This is an interesting visualization of inflation and how price changes relative to our total spending are related: &lt;a href="http://www.nytimes.com/interactive/2008/05/03/business/20080403_SPENDING_GRAPHIC.html"&gt;NY Times&lt;/a&gt;&lt;br /&gt;&lt;br /&gt;http://www.nytimes.com/interactive/2008/05/03/business/20080403_SPENDING_GRAPHIC.html</description><link>http://www.uptimesoftware.com/blog/alex-bewley/2008/05/inflation-visualization.html</link><author>noreply@blogger.com (Alex Bewley)</author></item><item><guid isPermaLink='false'>tag:blogger.com,1999:blog-4269386030357445158.post-8998554116300073241</guid><pubDate>Mon, 12 May 2008 02:03:00 +0000</pubDate><atom:updated>2008-05-11T19:47:48.762-07:00</atom:updated><category domain='http://www.blogger.com/atom/ns#'>virtualization vmware</category><title>Reductionist Mindset</title><description>About a month and a half ago, I purchased and electric bike that looks like a &lt;a href="http://www.daymak.com/bikes-toronto/pages/ebikes/austin.html"&gt;scooter&lt;/a&gt; and have been riding it to work on a regular basis.  The speed is capped at 32km/h and electric bikes are part of a three year alternative transportation pilot project by our Ministry of Transport. &lt;br /&gt;&lt;br /&gt;Barring comments from peers about riding my electric razor, it takes me less time to get into work on the e-bike versus driving.  I also realized that I don't really need my SUV to cut through downtown and travel a wopping 6.5km (4 miles) to work.  Rather than spending hundreds of dollars a month in gas, it now costs me about $5/mnth in electricity to charge the bike.  In discussions with colleagues, they are also now starting to question the feasibility using an e-bike.  I don't know how many will actually make the leap, but at least they are aware of this alternative technology.&lt;br /&gt;&lt;br /&gt;So, why I am telling you about my e-bike instead of server virtualization or systems management?  Because downsizing in the datacenter is also starting to become front-of-mind to a larger group of IT staff (and the lines of business that drive IT).  The general attitudes are changing and going "green" (or lightening our eco-load) isn't perceived as a tree-hugging activity anymore.&lt;br /&gt;&lt;br /&gt;For years, we used to have lots of older Sun servers (E4000's, V220's) along with a host of other platforms used for development.  The cost of power consumption (and the corresponding air conditioning) was never in our minds.  When the management of the building our office is in decided to install electric meters on a per-tenant basis and start billing usage; well, let's just say power consumption became front-of-mind!   We couldn't believe how much juice a few hundred servers sucked up -- as food for thought, data centers create almost 2% of &lt;a href="http://www.gartner.com/it/page.jsp?id=503867"&gt;global CO2 emissions&lt;/a&gt;.  We have agressively turned off the older Sun servers and migrated to their new Niagara platforms and we are actively virtualizing our x86 infrastructure using VMware.&lt;br /&gt;&lt;br /&gt;The power consumption mindset has manifested itself in a different way in the telco world.  Because the telco facilities manager who was responsible for all the switches also bore the cost of the power consumed, most vendors in the telco space engineered their products to be low power consuming.  This is unlike regular IT, whereby the "business," which is the prime buyer of technology to solve business problems, was divorced from the cost of power to drive their computing platforms.  Nortel has accidentally &lt;a href="http://www2.nortel.com/go/news_detail.jsp?cat_id=-9721&amp;amp;oid=100239309&amp;amp;NT_promo_T_ID=hp_hpg_04_21_08_going_green"&gt;stumbled on a green&lt;/a&gt; play in that most of their technology was designed to be low power (from their telco background).&lt;br /&gt;&lt;br /&gt;Back to consolidation, we are taking the virtualization program on VMware even further than simply consolidating physical systems into virtualized ones.  We're working on technology to identify workloads that can be offlined or VMotion'd to core servers during low-usage periods and then subsequently idle the physical systems that contained the instances.  For example, in our development lab environment, QA and Support require hundreds of instances for testing and simulating.  When staff go home at night, about 90% of these instances are not required.  It would be ideal to be able to move the workloads of the active 10% onto one or two physical systems and then shut-down (or go into low-power mode) the rest of the physical infrastructure that supported the active 90%.  When people start coming in in the morning the systems and instances would then be brought back to life.&lt;br /&gt;&lt;br /&gt;The long term savings for power and HVAC of being able to quiesce off-hours workloads is potentially huge.&lt;br /&gt;&lt;br /&gt;So, let's take a look at our carbon foot-prints and see what we can minimize.</description><link>http://www.uptimesoftware.com/blog/alex-bewley/2008/05/reductionist-mindset.html</link><author>noreply@blogger.com (Alex Bewley)</author></item><item><guid isPermaLink='false'>tag:blogger.com,1999:blog-4269386030357445158.post-2485807248800211119</guid><pubDate>Mon, 21 Apr 2008 16:04:00 +0000</pubDate><atom:updated>2008-04-21T09:43:46.260-07:00</atom:updated><title>Am I being lied to?</title><description>When I talk to customers who are at the data center manager or VP of IT level, there is a rather unnerving topic that is almost always brought up: am I being lied to?  Or, more politely rephrased, am I not being told everything that I should be?  Now, before you think I'm bringing up something juicy, that's not the case; I'm talking about how difficult it is for an IT manager to fully understand what are all the components in their infrastructure that are causing outages -- and then to get up-front answers from their staff (and the tools they use) about the actual root causes. &lt;br /&gt;&lt;br /&gt;Rarely do you see a network manager put up his hand and state, "yeah, we misconfigured the router and throttled traffic for a few hours, sorry that it affected most of our applications;" or application developers confessing "we didn't think that a small SQL change would kill the db 'that' much."  It's almost always the inadvertent modifications that cause the greatest outages, or impacts to an application service.&lt;br /&gt;&lt;br /&gt;There are two problems in these common scenarios:&lt;br /&gt;&lt;br /&gt;&lt;ul&gt;&lt;li&gt;Most IT managers have a really hard time getting metrics that span network, system, storage, and application layers and presenting them in a meaningful format.&lt;/li&gt;&lt;li&gt;Transparency of metrics really is a cultural issue.&lt;/li&gt;&lt;/ul&gt;&lt;br /&gt;In the first case, infrastructure is divided into many departments: network, system, application, storage.  Each administrator or infrastructure manager has their own tools that they are familiar with and only pay attention to metrics that are relevant to the service they deliver.  In almost all cases, this is never with the business user in mind. &lt;br /&gt;&lt;br /&gt;Take email, for instance.  Email is many things: an external link to the internet (or to disparate offices), gateway servers, virus and spam scanners, MTAs, message stores, directory lookups, web portals, etc.  In one relatively "simple" application you've spanned all the infrastructure stacks - network, system, application, and storage.  So, when a sales line-of-business user files a service ticket saying email is down, where are you going to look?  What actually caused the problem?  If there is a manager for each part of the infrastructure stack, you'll probably get four different answers (from many different tools).  This situation usually gets reduced to finger pointing and rarely results in a holistic look on the actual cause of the outage.  Wait until you implement VMware or other kinds of virtual infrastructure; the dependencies between applications and systems is going to get very difficult to understand and monitor.&lt;br /&gt;&lt;br /&gt;This leads into the second problem of infrastructure monitoring: transparency.  In discussions I've had, there is a very interesting dichotomy in customers and how they deal with this.  There are a number who are terrified of presenting a view to the business of how IT infrastructure is working (or not).  They feel that offering a view into the infrastructure and its availability would completely undermine any credibility of the department.  In most cases, I've found there isn't a lot of credibility to begin with and by exposing relevant metrics, the framework for constructive discussions around application availability can be set.&lt;br /&gt;&lt;br /&gt;We've seen some fantastic turnarounds in IT department credibility and transparency after the implementation of up.time.  Not only does up.time span the infrastructure stack, but it also presents relevant metrics to both administrators who need to manage the environment; and to line-of-business users who simply wish to understand the availabilities of the applications they need.</description><link>http://www.uptimesoftware.com/blog/alex-bewley/2008/04/am-i-being-lied-to.html</link><author>noreply@blogger.com (Alex Bewley)</author></item><item><guid isPermaLink='false'>tag:blogger.com,1999:blog-4269386030357445158.post-8097199982344093752</guid><pubDate>Mon, 24 Mar 2008 16:05:00 +0000</pubDate><atom:updated>2008-03-24T09:08:13.698-07:00</atom:updated><title>WSJ Article</title><description>The virtualization battle is heating up and up.time is well positioned to deal with all the latest entrants into the virtualization space.  Rather than being a pure virtualization play, up.time is taking its core roots in heterogeneous platform management and marrying them with the capability to monitor on virtual platforms.&lt;br /&gt;&lt;br /&gt;For the WSJ article, click here: http://online.wsj.com/article/SB120398945599592373.html?mod=sphere_ts&amp;amp;mod=sphere_wd</description><link>http://www.uptimesoftware.com/blog/alex-bewley/2008/03/wsj-article.html</link><author>noreply@blogger.com (Alex Bewley)</author></item><item><guid isPermaLink='false'>tag:blogger.com,1999:blog-4269386030357445158.post-2297569584167009197</guid><pubDate>Tue, 18 Mar 2008 00:56:00 +0000</pubDate><atom:updated>2008-03-17T17:57:11.083-07:00</atom:updated><title>VM Sprawl</title><description>Even though our product monitors VMware environments, we ourselves are big users of VMware internally for our QA and Development environments.  This is great for us, as we use our tool internally to manage the QA and Dev systems.  In fact, is was our early adoption of VMware a few years ago that lead to the extension of up.time to monitor and manage VM environments.&lt;br /&gt;&lt;br /&gt;Our biggest problems involved:&lt;br /&gt;&lt;ul&gt;&lt;li&gt;losing VMs in the physical environment (see the photo [next], and this is just a small snapshot of the QA environment)&lt;img style="width: 100px; height: 100px;" alt="" src="http://media.compendiumblog.com/images/blog_images/bf3d1b41-02b5-454e-84f7-f7dbf0d45a48/c183520c-233e-425d-8d3d-e7e72725de7b/18-01-08_0852.jpg" align="right" /&gt;&lt;/li&gt;&lt;li&gt;temporary creation of VMs for QA testing and then forgetting to decommission them (thus consuming memory and disk storage [cpu not so much])&lt;/li&gt;&lt;li&gt;finding the right configuration of a VM image to create a QA environment (e.g. Oracle 10g on RedHat with WebLogic 9.x, etc.)&lt;/li&gt;&lt;li&gt;running out of space on the SAN pools b/c of storage sprawl&lt;/li&gt;&lt;li&gt;departments requesting additional physical hardware b/c we couldn't internally map out our resource usage across the VM physical systems.&lt;/li&gt;&lt;li&gt;occasional load testing killing a physical system (b/c caps weren't put on)&lt;/li&gt;&lt;li&gt;trying to identify dependencies between VMs for particular tests or development runs (e.g. system X has Oracle, system Y has WebSphere, system Z has up.time, etc.).&lt;/li&gt;&lt;/ul&gt; So, as you can see, there were a number of issues related to virtualization that we needed to address.  Since up.time was already a great tool for collecting and analyzing performance data over time, we extended the tool to talk directly to VMware ESX (and Virtual Center) to extract physical and virtual configurations, detailed performance data, and Vmotion information.  Over time, we were able to graph VM instance performance information not just within a physical VMware system, but across an entire VMware farm.  We could identify which VMs were being migrated throughout the farm and how much compute, memory, and storage was being consumed.&lt;br /&gt;&lt;br /&gt;Additionally, because up.time is an active server monitoring tool, we were able to create monitors that triggers alerts when new virtual instances were provisioned.  This way, if up.time didn't already know about an existing instance (it automatically inventories instances across a VMware farm), an alert would be generated.&lt;br /&gt;&lt;br /&gt;There are a number of nice tools that already exist for provisioning management of instances (VMware Dunes, http://www.dunes.ch, and VMware Lab Manager), however, these tools don't actively monitor and profile a live VMware environment.  This is where up.time excels.&lt;br /&gt;&lt;br /&gt;We are continuing to develop VMware functionality within VMware and brining the traditional up.time "easy-of-use" mantra to VMware sprawl management.</description><link>http://www.uptimesoftware.com/blog/alex-bewley/2008/03/vm-sprawl.html</link><author>noreply@blogger.com (Alex Bewley)</author></item><item><guid isPermaLink='false'>tag:blogger.com,1999:blog-4269386030357445158.post-470326323694042584</guid><pubDate>Tue, 18 Mar 2008 00:55:00 +0000</pubDate><atom:updated>2008-03-17T17:56:07.913-07:00</atom:updated><title>System's Management Hairball</title><description>I've now been in this space for over 15 years, and while technology and tools have advanced at incredible rates, managing the technology is still a major pain.  Here at uptime we've been diligently trying to make performance and availability monitoring as easy as possible, however it's a constant challenge.  Sometimes I wish we could be like salesforce.com and become like a SaaS (software as a service) provider.  Why?  With SaaS, there is only one hosted code base and one hosted database (well, it's more complicated, but you get the idea), and this is running on hardware that you can control.  In our case, where our software is out in the wild at many customer environments we are faced with huge versioning problems.&lt;br /&gt;&lt;br /&gt;For example, we monitor Solaris, Linux (RedHat/SuSE), Windows, HP/UX, and AIX systems.  We have to deal with code bases in our agents for all these platforms, now, add in platform specific issues (such as architectures, 32/64-bitness, kernel changes, tech releases) and you are asking for an exponential increase in issues.  Now factor in all of the applications and services that we monitor (such as Oracle, SQL Server, Exchange, WebLogic, WebSphere) and their corresponding version upgrades.  This is on top of our monitoring station support and the platforms and databases that it uses.  Now introduce VMware (and other virtualization technologies) into the equation and you're in for a world of hurt.  What does this mean?  It means that companies in the system's management space, like uptime, spend considerable resources on simple software hygiene, time that could be better spent innovating.  Unfortunately, this problem isn't going away any time soon, so we are constantly looking at other methods to simplify.&lt;br /&gt;&lt;br /&gt;So, can system's management software become SaaS-like?  In my opinion - no.  What IT Manager in his/her right mind is going to allow an external vendor access to their secure internal environment, especially when the monitoring software is going to cut to the core of their production servers.   This is also why application vendors such Oracle are starting to get into the management space, they know that intelligent management of the applications is where the money is at.&lt;br /&gt;&lt;br /&gt;Next up, virtual server sprawl.</description><link>http://www.uptimesoftware.com/blog/alex-bewley/2008/03/systems-management-hairball.html</link><author>noreply@blogger.com (Alex Bewley)</author></item></channel></rss>