My landlord just kicked me out. Let me rephrase that. My landlord just politely asked me to leave his property before my lease is up. Thanks to him, my wife and I have been packing our stuff in preparation of our move. In all honesty, my wife has been doing most of the packing. I asked her to pack up all her belongings and kitchenware and I would do the rest. Somehow, that message got lost and she packed my things too. When I needed to look for my shoes and couldn’t find them, I was upset. I reiterated how I asked her to only pack her stuff but she said I never said her that. Before I knew it, the blame game was in full effect. It was her word versus mine.
These kind of things happen in IT infrastructure management too. When you have more than one tool to monitor your environment and more than one data source for capacity planning, how do you know which one to trust? The justification for IT environments to use a variety of monitoring tools is that their current set of tools cannot provide all the visibility they need. For example, some tools are strictly for network monitoring. Others might go really deep in Windows monitoring but light on everything else. What’s worse is if there is an overlap in the metrics from each tool, so which one should you go with? Different tools will gather metrics in different ways and at different time intervals. One tool might catch a spike while another may not. It is a full time job just to consolidate data and close information gaps to make sense of it all.
Here is where up.time is different. up.time provides unified monitoring for all the silos within an IT infrastructure so you can have a true ‘single pane of glass’. You don’t have to duct tape point tools together to make a homemade Swiss army knife. up.time IS the swiss army knife and provides a unified and comprehensive view. It makes capacity planning a breeze because it provides a single data source so you don’t have to try to make sense all the differing metrics! You can eliminate the blame game (and headaches) in IT when you don’t have multiple tools telling you different things. You don’t have to go to war with the network team arguing whose data is right when you have a standard tool providing a single view of the truth. up.time is the solution that enables you to be the IT superstar. Download up.time and give it a spin today!
So who likes to travel? If we were to play Family Feud and name something that people like to do, I’m certain “travel” would be one of the top answers. But how many of us like waiting to board a plane? How many of us like delays and spend unnecessary hours at the airport? Right. I didn’t think so. So when I read in the news about the Toronto Pearson International Airport having a computer outage that led to significant delays, I can imagine how frustrating it would have been. Being a Torontonian, I know our airport does not hold the title of being the busiest airport in the world. Nonetheless, it was ranked (albeit 38th) in 2011. Part of the news article said “technicians are not sure what caused the problem”, which is a scary thought. Unfortunately, this is not an isolated incident that only happens at Nav Canada; the company behind traffic control at the airport.
Root cause analysis is one of the holy grails in IT management. If you are a system administrator who can pinpoint exactly why an outage happens, not only will you look like a superstar, your users/customers will love you for it. How can you achieve that with up.time? First of all, you need to have a unified dashboard so you can see things as they happen. But just as important, is getting alerts to the right person at the right time. But once you get the alert, what’s next? You need to be able to monitor complex business services.
There are two key points to consider:
First, you must have coverage for all the underlying components that make up your business services. Whether it is OS, applications, or network and network devices, you must have visibility to everything in your infrastructure.
Second, you need to be able to tie all the different components into your business services so that you can see the overall health of your services and exactly which component(s) is down.
The latter is vital if you want to perform root cause analysis. Having a tool (like up.time) that facilitates root cause analysis will make you the superstar (that you are), save you time in troubleshooting issues in your environment and get to the root cause of any outages with ease! If you haven’t tried out up.time in your environment, you need to download it and take it for a spin!
HTTP, IP, RAM, CPU, MB,…there are tons of acronyms in the IT world. Heck, the word IT is an acronym. Some companies use acronyms as their names as well. Sometimes acronyms can be intimidating. Quite often, the reason why we shiver when we hear acronyms, is because we don’t know what they stand for. For example, do you know what SNMP stands for? A quick Google search yields Simple Network Management Protocol. Well, that tells me it’s a network protocol. It doesn’t sound very exciting. So let me re-define it as:
SNMP is
Necessary in
Monitoring your
Paradise/Prison
And by Paradise, I mean your datacenter. Feel free to substitute Paradise with Prison if you aren’t proactively managing your environmentBut why is SNMP a must if you want to get a handle on your datacenter? There are two main reasons:
Visibility to Hardware Failures
When you deal with computers long enough, you are bound to experience a few hardware failures. On enterprise-grade servers and devices, there are usually redundancies to increase availability. However, redundancy just means there are at least 2 of some components. There will come a day when all of the components fail. If you don’t fix the failures when they pop up, you are putting yourself at risk of a disaster. But of course, you can’t fix something if you don’t know about it. How do you know if there is a failure? That’s where SNMP comes in. Servers and devices in businesses frequently have SNMP capabilities to send what’s called an SNMP trap to a centralized server. The SNMP trap is just a message notifying someone about a hardware failure. Having the ability to receive such a message is essential if you want to be on the ball when it comes to failures.
Visibility to Device-Specific Metrics
Any device can support SNMP. If you really wanted to, you can even enable SNMP on a toaster! The flexibility of SNMP allows administrators to pull whatever metrics and/or statuses they want as long as the OID’s (Object ID, I know, another acronym…) are known. The type of metric available will depend on what the vendors expose. The metrics can range from fan speed, number of power supplies to even the ambient temperature. Keeping an eye on these metrics will provide a complete view of your environment.
We have discussed the virtues of having a single pane of glass to give you a complete view of your IT infrastructure. The two reasons above are why SNMP needs to be a part of your monitoring strategy. up.time’s SNMP monitoring capabilities make it easy for you to get a handle on your environment. If you haven’t yet, take up.time out for a test drive, make sure you do!
“So something *WAS* wrong with the network but was that the root cause?”
In my last post, we looked at how you can use the brand new Network Dashboard in up.time 7 to look for network issues. Now that you have confirmed the network was to blame for the poor performance, is the case closed? How do you know if there aren’t other issues? Are you content with guessing and hoping the network was the sole reason for the degradation of performance? That is almost like your doctor saying he/she can give you a full body checkup just by looking at your pinky! Today’s IT infrastructure is not composed of only the network. So how can you dig deeper?
Resource Scan Dashboard
up.time enables users to get a complete view of their IT environment without navigating through a myriad of tools, or what we call “tool soup”. Jumping through different consoles is not only time consuming, it also makes the lives of system administrators more difficult. The time it takes to recover from outages will take longer, which ultimately, will affect your business. See more about tool soup here. How does up.time help? First, if you already have up.time’s intelligent alerting configured, the notifications you get should point you to the right direction. But let’s say if you wanted to do some impromptu analysis. How do you do that?
Visibility Over Time
If it’s a performance-related issue, I would recommend looking at the Resource Scan Dashboard – see image above. It gives you a general sense as to how your entire environment is doing. At the same time, with a single click, you can drill down to the server(s) to get deeper information. Want to see why there was such a high memory usage in the middle of the night? up.time gives you visibility by letting you go back in time to identify the resource hog. Again, up.time’s single pane of glass allows you to gain a complete view of the infrastructure. If you haven’t already, download up.time to gain 20/20 vision into your complete IT environment so you won’t miss a beat!
How many times have you heard a request similar to that? With so many moving parts, it’s not a trivial task to diagnose what’s really going on. Thanks to the popularity of virtualization in data centers, your job is not getting any easier. There are many factors, such as application, network or server performance, that can affect the end user experience. In a virtual environment, you also have to look at utilization of the virtual machine, physical server, cluster, resource pool and sometimes even the entire virtual environment as a whole. So where should you begin? It’s almost like you need an extra eye to keep track of everything!
up.time 7 - Global Network Dashboard
Hot off the stove is the release of up.time 7 . A major enhancement to up.time 7 is the addition of the global network dashboard. Taking a peek at the global network dashboard gives you immediate insight into what’s going on in your network. It displays the network bandwidth (both in & out), the latency, number of errors and discards in catchy colourful dials so you can quickly visualize the performance and health of your network. Underneath the dials shows the top 10 bottlenecks for each metric so you can identify the devices that are slowing down your network. In addition, the global active issues list instantly shows any outages in your network. So if the network is the cause of why your end user feels the application is “performance-challenged”, the brand new network dashboard will be invaluable to you.
So why did I call this post “The Complete IT Dashboard“? Network alone, of course, does not make up the entire IT infrastructure. As mentioned, there are many other components to a data center. up.time enables you to dive into all aspects in your environment. How does it do that? This is the first post of the series so make sure you stay tuned! If you want to jump ahead and see what up.time can do in your environment, download it and start managing your infrastructure through a “single pane of glass”!
It’s finally here! We, at uptime software HQ have been busy over the past months, planning and preparing for our newest release, up.time 7. Earlier today, we announced the launch of up.time 7, now available for download! You can now try the full version for 30 days, absolutely free.
Watch our 5-minute Quick Tour of up.time 7 below.
up.time 7′s new Network Performance Management allows you to:
Immediately see if problems are at the server, application or network level from a Global Network Dashboard
Monitor availability & performance of 1000s of different network devices
Easily report on your up.time server and device inventory
The wait is almost over, up.time7 is launching on June 12th!! We’re putting the final touches on the release and are very excited to put the latest and greatest version of up.time in your hands. We’re itching for you to see all of the amazing new capabilities we’ve added to help you with your day-to-day network monitoring challenges.
This final sneak peek before the launch is all about the new network dashboard. This dashboard will let you see the status of your entire datacenter from a single view including all of your switches, routers and other network devices. No more jumping around from tool to tool, one click from your Network Dashboard lets you drill down to the root cause of performance bottlenecks anywhere in your datacenter.
up.time 7 Network Dashboard
The Network Dashboard Includes:
Instant Capacity Visibility: Global bandwidth & error trends are instantly available to let you see how capacity has been trending over the past 24 hours. Easily identify any bottlenecks that could be impacting your network performance.
Global Active Issues List: Network devices with faults or performance issues are highlighted on the Network Dashboard so that you can always see what problems need your attention. With one click you can move from your global dashboard to any device that is having issues.
Top 10 Lists: Spot key performance bottlenecks early through top 10 lists that show you where the hot spots are in your network. Never get caught not knowing again.
Our next big release of up.time will be “up.time 7 (coming soon),” and I wanted to give you a sneak peek into some of the powerful new Network Performance Management capabilities, including deep performance analysis for network devices. Out of the box, up.time 7 will collect an array of deep performance metrics from your network devices. All you will need is SNMP credentials and the rest of the work will be done for you by up.time. No manual setup needed for best practice performance metrics.
The new “Network Quick Snapshot” dashboard for network devices (shown below) is designed to give you an instant summary of device performance and visibility into key bottlenecks that could be impacting service levels. In this case, we’re looking at a Cisco Catalyst switch sitting in front of a Dell EqualLogic iSCSI SAN device, hosting dozens of VMware datastores. The VMs using this SAN depend on the Cisco switch being up and running with enough capacity to meet any workload demand that is thrown at it. Key metrics including latency, packet loss, in/out bandwidth, errors and discards are available instantly for your review.
In addition, potential network hot spots and problem areas are highlighted to save you time. From here, you can drill down into specific metrics for deep root-cause analysis of performance problems or compare current performance to longer term trends in capacity, for example. This ensures you stay one step ahead of any potential network capacity problems, and helps you move from a reactive IT mode to a proactive IT mode.
Don’t worry, you don’t have to watch the dashboard all day long to stay on top of your network! As usual, up.time will keep an eye on all the performance, availability and capacity metrics and alert you (email, sms, etc) if there are any problems or potential threats that need your attention.
Improved Network Performance Management has been a long running request from up.time users. Coming from primarily a server and virtualization monitoring history, network monitoring certainly hasn’t been a focus for us in the past. I’m happy to say that we’re listening, the next version of up.time’s monitoring solution is aiming to dramatically improve our network monitoring capabilities. We’ve been working on this for a few months and are very happy with how it is turning out. I’d like to give a preview of one of the first major pieces of functionality in the next release, our freshly redesigned SNMP Monitor that aims to make collection and alerting on custom SNMP data as easy as possible.
Why redesign the SNMP Monitor? The old monitor had a number of limitations that made it far too difficult to setup in a way that provided valuable results. Ease of use and time to value are very important to us and this monitor wasn’t up to the standard we, much less our users, expected. We also found that the old monitor didn’t do many of the basic things we needed it to while building our other network monitoring components. So we decided to redesign with these goals in mind:
Improve the ‘out of the box’ options for custom SNMP values by including a much wider array of vendor MIBs so that you don’t have to go searching for MIBs
Reduce the pain of managing and changing SNMP data points, this process should be as easy as any other monitor in up.time
Focus on flexibility, we want you to be able to gather just about any SNMP value you can think of and use it in a way that makes sense for your environment
Here’s a screen shot of the new SNMP Poller configuration page:
Here are some basic examples of things that you can do with the new SNMP Poller that you either couldn’t, or found very difficult to, accomplish with the old SNMP Monitor. The examples below were configured in 8 minutes total then let run for a while to gather some real world performance data.
Handle Large Tabled Values: Here we’ve setup a monitor to collect ifInOctets across all interfaces on a device
Monitor Vendor Specific Info: Here is a graph of server temperature gauges collected from an SNMP enabled Dell server
Monitor Hardware Status: Here we see a few monitors checking power supply status and processor temperature