The up.time IT Systems Management Blog

Archive for the ‘Uncategorized’ Category

This is your brain on up.time

Monday, July 4th, 2011

Remember those commercials?  I have recently been spending a fair bit of time in our Knowledge Base (KB) and it occurs to me that our KB is sort of like our collective brain in terms of storing product information and other useful tips, hints and mini how to guides.  Unfortunately, like the brain in those commercials, our KB has recently been (ahem) on a bit of a hiatus.

So several weeks ago I said: hold on, let’s do something to enhance all data, yesterday!  Since that time we have been working on 4 key initiatives.

First, we’re working to update content to ensure that subject matter reflects current product versions. For example, there are 7 new articles that cover various aspects of installation and platform support related to our new 5.5 release.  There have also been numerous updates to many older articles and elimination of articles that are no longer relevant.

Next, to make it easier for active KB users to catch up on recent material, we have provided a new KB home page with lists of recently added articles, the most highly rated and most viewed articles as well as the traditional opportunity to browse by subject.

We’re also reviewing how the KB material is organized and working towards a shift from broad technical areas to more specific topics that I hope will relate more closely to how our clients actually use up.time and seek support for the product.  Clearly real active imagination goes far in new nomenclature.  So we have tried to be creative in setting up the new hierarchy.

Finally, we have launched a new online troubleshooting tool to help leverage KB content into a triage model that is intended to help guide users more easily towards an answer, resolution or at least some guidance in their next steps in seeking support.  We’re still in the early stages of building out this capability and there is still lots of work to be done but I think we’re on the right track in sparking up our KB and online support tools.  If you have the chance to browse the KB and test out the troubleshooter, I’d be interested in receiving your feedback.

SLA Tips & Tricks

Monday, January 10th, 2011

up.time SLA Monitoring Video

Welcome to the New Year everyone! I hope you had a a lot of delicious food and some good times over the holidays, or at the very least some good time off. I myself enjoyed it so much that my new year’s resolution is to hit the gym a lot more (and stop stuffing myself with chocolate as well).

With a new year comes great responsibility, and being in IT, we have to make sure we keep our users happy at all times. To keep the user’s expectations in check we have these fancy SLA documents and reports that are supposed to show how good of a job we’re doing to keep everything up and running all the time. Unfortunately there’s a few problems with this; it can take forever (from days to weeks) to create this report, fudge factors may exist in the data, have to get metrics from different systems/plaforms/layers/tiers/(insert_buzz_word_here), and did I mention this takes forever?

So here are a few tips & tricks to avoid falling into a coma from boredom while still gathering all these important metrics for this report.

1. Defining Metrics – Before even creating the SLA document/report, make sure we’ve defined which metrics indicate availability and performance of the service we’re providing. We also have to make sure we can actively monitor the metrics with our monitoring solution as well. Luckily up.time can monitor pretty much any kind of metric with thousands of default metrics collected out-of-the-box along with an extensive plug-in architecture to allow for a near unlimited number.

2. Baselining Current Service Level – Once we know what we need to look for we need to get a baseline for how we’re currently performing before we commit to providing a level of service that we cannot possible achieve. For this we create all the monitors in uptime to gather/threshold on the metrics so we get the stats.

3. Proactive SLA Management – Once we create the SLA in uptime with objectives (SLO’s) that defines the availability and performance, we get instant visibility on the SLA dashboard. We can also set it up to automatically alert us when a severity 1 (SEV1) issue occurs and starts to affect SLA performance. So not only do we get component-level alerting/self-healing, we also get the alerting and self-healing capabilities at the SLA-level as well.

4. Quick SLA Report – For those that are currently creating your SLAs manually, this will be a big one for you. How quickly can you get that SLA report compiled and completed? Well mine is sitting in my inbox since this morning already so that’s pretty sweet, especially since I can now choose when to tell my boss when I’m finished compiling the report.

If any of this sounds interesting to you we’re having an online webinar coming soon where you can come see for yourself. Also, for a quick overview of setting and managing your SLAs with up.time, check out my latest video – click here to watch.

Hello World.

Tuesday, July 20th, 2010

Being one of the new kids on the blog I should introduce myself a little, so let’s get this out of the way. My name’s Joel and I’m one of the Solution Architects here at uptime. I like fast cars and geeky toys, and since my latest toy for this summer is a Honda CBR600, I’m sure I’ll manage to incorporate that into some of my posts as well. I also have a thing for Audi vehicles, which apparently seem to be discussed quite frequently by some other german-car-drivers around here. That’s alright though, because we always enjoy a little friendly competition.

My focus will be to blog on detailed technical features in up.time and how they can be valuable for you. I’ve recently began with a couple of videos highlighting some new features of up.time, NetFlow and Agentless Windows Monitoring (click here to find them on the videos page), and more will be on the way. I hope I can bring some value for you and show how up.time is really meant to be used.

The Pursuit of Happyness

Tuesday, May 25th, 2010

Pursuit of Happyness - uptime Systems Management Blog

I’m sure many of you have seen The Pursuit of Happyness, a 2006 movie starring Will Smith (apparently a favorite around here). I recently read the true story book by Chris Gardner that the movie is based on. The book is somewhat different from the movie but equally entertaining and deeply inspiring. And the title got me to thinking…

I get involved with a lot of functional areas at uptime – it’s one aspect of my role that I really enjoy. But I consider keeping our clients happy my primary objective. I think a lot of companies measure customer satisfaction by simple metrics such as how fast they deliver product, how well they respond to problems or by some flashy web site statistics. Those parameters are important but I tend to take a broader view and try to tackle the overall dynamic of how clients experience our software throughout the evaluation, implementation and deployment lifecycle. I want our clients to not just be happy with our software but to utilize it to its full potential and derive maximum benefits from its use.

I have a lot of thoughts about how we can provide an ideal client experience and while my ideas may be interesting, I recognize that it’s the clients’ opinions that really matter. So, to tap into our user base, we decided to launch our first-ever Client Services survey. We did this for two reasons:

  1. To get a sense of how happy our clients are now and establish a benchmark for measuring our future improvement.
  2. To better understand what’s important to our clients and help identify where we should be focusing our efforts.

We just completed the survey process and I’m thrilled with the response. I won’t be sharing results here because we respect our clients’ privacy and promised to use their feedback solely for internal purposes. However, I will say that I’m overwhelmed by how willing our clients were to not only answer our questions but to also take time to provide detailed comments. I’m happy to receive validation that we are actually doing a pretty good job at keeping our clients happy but I’m also looking forward to spending time working on some new initiatives that should help raise our level of service even higher.

I want to say thanks to everyone who took the time to respond to our survey; we do sincerely appreciate your input. We are planning to run our survey again in the future to evaluate our progress but if you’re an up.time user and have some feedback that you would like to provide, please don’t feel that you have to wait for the next survey, we would love to hear from you any time. By the way, if you’re reading this and you’re not using up.time, what are you waiting for?

Find Your Inner Fighter Pilot

Monday, April 12th, 2010

In systems management, we can learn alot from the mentality of a fighter pilot. What – you say, Ken’s been smoking the good stuff over the sunny Toronto weekend? What could a fighter pilot possibly have in common with someone in IT systems management?

A lot more than you think.

Think about it, what is a datacenter? A highly tuned combination of hardware and software designed to deliver services to the business. What is a Jet Fighter? A complex combination of millions of hardware components with a highly tuned set of software components designed to defend the pilot and provide the services nessecary for him to project his will at command. Wow, not so different?

So where have we gone wrong? What can we learn from the Jet Fighter Pilot? The difference is in approach. Just like the pilot and his cockpit we have huge arrays of data available to us through gauges, niche software, profiling tools, scripts… you name it we have it.  Guess what? When the pilot is in the heat of an engagement, he’s assessing his threats, he’s not sitting there fixating on a particular gauge. We need to stop fixating on niche tools, profilers and other specific metrics, we need something similar to a Pilot’s heads up display that will us to assess the biggest threats to the IT organization.

Worse, you’ve bought a tool that claims to do this, but rather than having a nice seamless HUD display or ”single pane of glass“, you have a “stainglass window” comprised of dozens of individual applications poorly duct-taped together.

Good thing uptime has a very specialized set of reporting capabilities to allow you to figure out where your major IT problem hotspots are, which infrastructure is suffering  infrequent downtimes, and where constant “5 minute problems” are sapping your team’s productivity.

All of the above issues ARE the major threat to IT, those are the things that make people wonder “Why aren’t we outsourcing this service? it NEVER works!”, this is the equivalent of having your jet fighter shot down.

Join me on one of our upcoming webinar series and find out how to unleash your inner fighter pilot.

Service overload, it’s happening again, this time with real consequences

Friday, January 15th, 2010

A while back I wrote a blog post on how an event in our popular culture, in this case it was the death of pop icon Michael Jackson, can cause unpredictable and unprecedented increases in traffic to online services.  In the case of Michael Jackson, TMZ and other sites were unable to handle the traffic of their readership trying to find out what had happened.  Well here in Canada, and I’m sure in other countries, the outpouring of support for those who have been hit in Haiti by the magnitude 7 earthquake is bringing the webservers of aid organizations to their knees.  With the surge in donations on their systems, the servers are periodically crashing.  Fortunately they are back online, but still unable to fully handle the workload imposed by those trying to give.  As this article points out, please keep trying to donate, as every dollar is needed in this dire time.

In Canada, our government is matching every dollar contributed by Canadians to the relief effort.  Perhaps some of the cloud providers out there could donate their infrastructure and technical expertise to shoulder the donation collection burden from these organizations.

The Hitchhikers Guide to Cloud

Thursday, January 7th, 2010

I have just started using a service called Evernote to try and allow me to keep my notes and thoughts organized across all moments of inspiration, brainstorming and discussions with others whenever or wherever they occur.  So far it looks to be a promising solution.  Evernote is essentially providing me with cloud based storage with their particular access paradigm on top of it.  They have clients for all manner of OS and device as well as a web client.  I can access my Evernotes pervasively, wherever I am, and from whatever technology mechanism I have at hand.  This is one of the promises of the cloud and they fulfill this promise.

This however, is not the ultimate promise of the cloud.  Ultimately I would like to be able to access my Evernotes and any other data or data management/manipulation services from the cloud as a single federated source of information and information processors/transformers.  Aside from the fact that there are no standard cloud information sharing protocols or data manipulation standards being used by all service providers, one of the key problems is the issue of federation and trust.  We’ve got passport, openID, and other technologies for a federated identity management solution, but the adoption of these technologies seems to be absent in many of today’s cloud offerings.  I use a few different cloud services now, and I have a different userid for all of them.  Even if they provided a means for me to link their services with one another, I would still have to manage a different identity across services.

This same federated/aggregated service mashup challenge exists in the systems and server monitoring space.  With services moving to the cloud, multiple datacenters and 3rd party IT interfaces, you need a management and monitoring tool that can manage these components locally, but still be able to aggregate them into a global view with the flexibility to mash them together into higher order views that take the local information and, through a little magic, allow you to create global knowledge. 

For Example, in up.time we have had our local monitoring instance – or what we call and LDC instance - and our global console – or EMS -  deployed in a large distributed enterprise to allow customers to extend basic monitoring from  a local monitoring tool into an enterprise service delivery knowledge platform. This provides you with critical information on your infrastructure, as well as knowledge about how those services are delivered across your business, with the explicit understanding of the business impact of those services.  When we have silos of valuable information, combining them together turns that information into actionable knowledge.

The cloud is allowing us to create highly accessible and pervasive silos of very valuable information.  However, no matter how much information you have, it’s only valuable when we can convert that information into knowledge.  The potential for the cloud as a future knowledge platform, with the appropriate federation between services and between users of those services, is a great opportunity enabled by technology of the 21st century.  It has the potential to fundamentally change how we do things. 

When speaking of knowledge, “Tacitness generally describes the extent to which knowledge is not codifiable (Galunic and Rodan, 1998). Tacit knowledge is personal, context specific, and therefore hard to formalize and communicate whereas explicit or codifable knowledge is transmittable in formal and systematic language (Nonaka and Takeuchi, 1995). Furthermore, intangibles like specific knowledge is expensive to transfer across because it cannot be easily aggregated meaningfully (Hayek, 1945).” – (Theory of the firm, Bach Seung, Bai – 2004)”

We are filling the cloud with an unimaginable amount of tacit knowledge about anything and everything imaginable at an astronomical rate.  Combined with the AI technologies already available to mine, link and understand this data, we will be able to take these islands of knowledge from across the cloud and leverage it into a global knowledge platform with a tacit knowledge breadth that covers virtually everything.  We will be able to access this ‘Hitchhikers Guide to the Galaxy‘ from anywhere at any time, and it will always be up to date, with literally hundreds of millions of people updating this knowledge base in real time.

(I realize there are several major challenges to the earthly H2G2 related to the information processing, but look at where we are today already, and in a very short period of time, it’s not an ‘if’ but a ‘when’)

Microsoft finally draws their line in the clouds

Monday, November 23rd, 2009

As many of you are likely aware, last week Ray Ozzie announced that Azure (Microsoft’s cloud service) would go into full production on January 1st, 2010. Azure is interesting because Microsoft wants to keep the paradigm of desktop OS’s as a key part of the architecture with “the cloud” as an adjunct in what they call the “three screens and a cloud” vision. This vision is important, because it makes the cloud real for consumers and makes it more understandable and accessible to the general populace. Project “Dallas” also re-affirms Microsoft’s commitment to cloud computing as a whole, Microsoft unveiled just enough details to make the project interesting – i.e.  data-as-a-service.

For all the “evil empire” slag that Microsoft gets, people tend to forget, or ignore, what happens when Microsoft embraces a technology and tries to dominate that market – the technology just gets easier to adopt and becomes more real.

This is an important milestone in the development of the entire “cloud story”. Let’s be clear – Microsoft, due to their size and market position, does not have the need to innovate or invent new paradigms. All they have to do, and what they are good at, is step into nascent markets that are at the edge of becoming mature enough to explode. This is generally a moment of truth for any incumbents, as Microsoft can and does take advantage of their massive resources in an all out war for dominance. Once they ‘put their toes in the water’, they slowly wage a war of attrition on the incumbents, and buy all the best players and minds, until eventually their technology is pervasive.  We have seen this strategy in effect to great success over the years. Remember the browser wars, Database (SQL?), ERP, CRM, Content Management (Sharepoint), Audio Devices (Zune), Console Gaming (XBox) and the list goes on.

So what’s the moral of the story? When Microsoft wades into the game, it’s a very strong sign that it’s time to get with the program and adopt this emerging pardigm.

Large Scale Cloud Computing Adoption

Monday, October 19th, 2009

There is a very well written article over at ulitzer.com regarding the US Federal Governments IT spend plan for FY11 and their investigation into leveraging cloud computing as a cost cuttimg measure for federal IT spend.  It breaks the analysis down into 3 options:  Public, Hybrid and Private cloud.  In their analysis, the public cloud comes out at a BCR of 15.4 (Benefit/Cost Ratio) with the hybrid and private cloud coming out at 6.8 and 5.7 respectively.  I found these results rather surprising considering the scope of what their analysis entails.

We aren’t talking about migrating a few workloads to the cloud, but thousands and thousands of servers worth of federal workloads.  When defining the public cloud versus hybrid/private solution and the assumptions, they state for the public cloud it is a migration of ‘low-sensitivity’ data onto existing public clouds.  Based on the ever increasing compliance requirements and demand for data privacy and integrity, I would think that the low-sensitivity workloads would not comprise the lions share of the workloads being examined, thereby leaning the tables to the hybrid and/or private cloud offering.

When migrating to the cloud, todays organizations have many terabytes or petabytes (in the case of the US Federal Government, for thousands of workloads) of data that has to be migrated onto the cloud in order to move the complete workload to the cloud.  Moving and synchronizing petabytes of storage while maintaining service continuity through the migration is a non-trivial task.

While the analysis within the article is sound, I think that there are significant hurdles still in place from a large scale public cloud adoption standpoint that are not taken into consideration to the extent that they deserve.  Everyone wants the public cloud computing model to be successful, after all the benefits stand to be great.  I think that in the public cloud, from a security and connectivity standpoint, is not quite there yet for large scale initiatives.  I think that the real successes will come from the creation and adoption of private clouds, with the slow learned migration of workloads to the public cloud as we iron out all of the security, networking and compliance requirements.

Maybe it would make sense to have the public cloud providers offer their own hybrid approach where you deploy your own private cloud and they manage it for you.  You get to leverage the benefits of their processes and technologies developed for managing the public cloud, with the benefits that come with a private cloud.

Explaining Cloud to Your Boss

Tuesday, October 13th, 2009

Migrating infrastructure and applications to a cloud-based model can be fraught with peril, and the relative immaturity of the technology means there are risks.  If cloud has you freaked out as a data center manager or as somebody evaluating the implications of migrating certain applications, then take heart, you’ve already been through a similar situation already — and it turned out quite well.

So, what concerns are on your mind right now?  Reliability?  Shared resources?  Security?  Usage models?  Performance?  Jurisdiction of data?  Privacy?  Management?  Cost?   Given that these topics are only just being addressed by a myriad of vendors, it’s easy to see why you’d be reticent to jump into the cloud fray.

So, let’s roll the clock back fifteen years.  Data centers and corporate offices were connected together with special dedicated leased lines that cost a fortune to run and were of limited bandwidth.  Print and tape media were still widely used to distribute data.  Then, along came the Internet — a fantastic idea: unlimited peer-to-peer connectivity, global access, vendor neutral transport.  There were tonnes of issues, however, privacy, bandwidth, security, performance, availability; a litany of issues, not unlike cloud in its current nascent form.  All these issues were solved with technology over a short period of time: IP tunneling, routers, active switches, VPNs, fibre running around the world.

Within four years, the Internet became a key part of business, and at a huge cost savings to the data center: no more dedicated lines, and pretty much a pay-as-you-go cost model (bandwidth).  Additionally, there wasn’t any massive infrastructure change to the datacenter, it was an end-point cable swap.

Cloud adoption is going to work out the same way, it’s going to evolve in short order: pay-as-you-go models (like EC2), Cisco/VMware vSwitch networking (for transparent and secure VPN access), and more sophisticated management tooling that is cloud aware (for application monitoring and workload distribution).

Ultimately, in the next few years, there will be a very real distribution of applications across the PVC infrastructure (physical, virtual, cloud) to take advantage of performance, security, and cost.  Quite conceivably, your management tooling would help you calculate cost modelling of application workload to select particular cloud vendors at specific times during the day, yielding efficient pricing.

The next few years will be exciting times and we will be a major player in the PVC space moving forward.

Alex