Building World Class Software at uptime

Friday, March 28, 2008

How to undelete files on Linux

Carlo Wood explains how to undelete files on Linux ext3 file systems. The really cool thing is not that you can do it, but that Carlo decided not to believe the received wisdom that it was impossible, and went and figured out how to do it anyway.

One of my mantras is "I don't mind you complaining that it can't be done, as long as you don't bother the people who are doing it".

Labels:




SLA Inversion

So you’ve launched your new enterprise web site, and you’re so confident with your fully redundant web architecture that you figure you can provide a five-nines SLA - 99.999% uptime, or 5 minutes downtime a year. After all, what can possibly go wrong?

Welcome to SLA inversion, where a 99.999% available web service depends on a 95% available DNS server. Or a 98% available Internet connection. If your users can’t do work, it doesn’t matter why - they don’t care that your web application is still able to process transactions; they don’t appreciate the difference between application failure and DNS failure.

As it happens, this topic is close to my heart: I run a few cancer support mailing lists, and one day I decided to switch the domain name hosting provider. What I didn’t realise was that the new registrar wouldn’t let me edit the zone file until the domain transfer had completed, and that left me dead in the water for several hours: requests to the web site were going to the registrar’s default page. No matter that the web site and mailing lists were just fine; the point was that nobody could reach them.

Whenever you’re monitoring for a service level agreement, you need to follow a few important rules.

  1. Measure as close as you possibly can to what end users will see. That generally means running synthetic transactions at a minimum: HTTP for web applications and test emails for email services.
  2. Take time to think through all the services your users rely on, and make sure they’re monitored, implicitly or explicitly. For example, are there routers outside your control on the path between you and your users? If so, find out what SLA, if any, is in effect. You can’t provide a better target than the weakest link in the chain.
  3. If there is a weak link controlled by your customer, make sure it’s explicit in the SLA contract.

Service level agreements are all about avoiding nasty surprises: taking a bit of time up front to think through how you might be caught off guard will benefit you, and more importantly, your users.

Labels:




Worth 80 programmers

As soon as you tell management that the changes they want will delay the release date, their first question (in most companies at least) is how many extra people you want. Brooks's Law, that adding manpower to a late project makes it later, is not well understood outside the programming field.

Steve McConnell has a great anecdote about a late project where they took all 80 developers off the project and replaced them with one highly competent guy, who finished on time. The point is not that you should do this, but that the story is believable.

A good rule if you're in a large organization and someone offers you a horde of developers to improve productivity is to take them, find out which of them are any good, and assign the rest to tasks that don't touch production code (e.g., test case automation). A good programmer is at least 10 times as productive as a mediocre one, so keep the quality of your actual production team as high as you possibly can.

Filed under "thoughts that make me happy not to work for a bank any more..."

Labels:




Thursday, March 27, 2008

Issues of Trust

One of the big draws of open source software is that because you can examine the source code, you can, at least in theory, find bugs in the software and correct them. The idea that "with enough eyes, all bugs are shallow" reflects this. But Ken Thompson, one of the creators of Unix, gave a beautiful talk in 1984 called Reflections on Trusting Trust, which describes an ingenious method for introducing a security back door that would even work when you have all the source code available.

Let's say you wanted to modify the SSH daemon to remember the username and password of anyone who logs in, and send you the results. It would be a reasonably small change to the software to allow this, and assuming you had root access, you could easily drop in a modified version. But next time someone recompiled sshd, the change would be lost. How might you make your change more permanent?

Ken's attack uses the C compiler itself as the weak spot. First, he introduces code in the compiler to detect when it's compiling sshd and output the hacked version:

if (compiling_sshd()) {
output_hacked_sshd();
} else {
compile_normally();
}


But again, if you recompile the C compiler, the change gets lost. So let's add another check, this time for the C compiler:

if (compiling_gcc()) {
output_hacked_compiler();
} else if (compiling_sshd()) {
output_hacked_sshd();
} else {
compile_normally();
}


But here's the key idea: I can now remove this code from the compiler because the binary version will take care of it. Future compilations of gcc will re-insert the block, and so invisibly infect the binaries.

So if you could make this change on, say a Debian maintainer's machine, it could take quite a while before it gets noticed. Caveat Emptor...

Labels: , ,