We need more wizards!

No, I don’t mean Gandalf, I mean the software kind. And before I’m accused of being Gates’ live-in cabana boy (it’s all baseless rumors), let me clarify.

It’s a known fact that most OSes need tuning (sometimes significant) to perform well with heavy-duty applications (I’m not talking about your home web server, I’m talking about Exchange, SAP, Oracle, IIS, Apache etc. in large deployments. I acknowledge the fact that most OSes, out of the box, will work OK for anything small).

Most frequently the application documentation will have some kind of tuning guidelines telling you approximately what to do in each OS. The installer sometimes will apply some tunings for you after asking for your permission. Often, the suggested settings are woefully inadequate for truly large implementations, as with NetBackup (the Veritas-suggested tunings work for smaller environments but I have some magical kernel tunings as posted before that make it truly fly when the ridiculous is asked of it – and the difference in the parameters between my config and what Veritas suggests is huge. Oh, and some of my parameters are way smaller than what Veritas recommends. And I won’t call them Symantec, Veritas is a way cooler name anyway, look it up in a Latin-English dictionary).

Frequently, some tunings are so common that I don’t even know why they’re not in the default configuration in certain OSes. Different conversation.

The problem is, there are experts that DO know how to set up and tune the systems properly, but said experts are rarely the admins that install and administer the thing. Usually, a fair portion of those experts do work at the companies that make the OSes and apps.

The elitist among us might say, “tough, the lowly admins need to learn all this stuff, otherwise they’re not worth what they’re paid”. To which I respond with the following points:

  • Not everyone has the time to learn the arcana of several OSes and applications, learning most of the important features is complicated enough and some shops are truly short-staffed
  • The über-experts themselves don’t know it all: They may know how to perfectly set up Exchange but wouldn’t know how to do the same thing with Oracle, how can the basic admins be expected to have such multi-discipline expertise?
  • I firmly believe in the simplicity of the appliance computing model
  • We all have more important things to do (like taking care of the big picture) than constantly worrying about minutiae
  • The people that complain that the admins should be more intelligent are typically the people that actually enjoy dealing with the apocryphal, their jobs are secure anyway
  • There’s money to be made in the simplification of IT – look at Microsoft, EMC/VMware and NetApp. People like simplicity and are willing to pay for it.

Of course, many larger companies will opt for professional services to do the job, but the quality of people just varies dramatically. Just because you’re getting an expensive Veritas PS guy doesn’t mean that

  1. He knows what the hell he’s doing beyond what’s in the installation manual (you know who you are!) and (less significantly)
  2. Is even a Veritas employee, despite his badge (most vendors subcontract smaller companies).

At the moment, most OSes just apply generic formulas based on memory and/or number of CPUs, though somehow do not take into account CPU speed and load, and, indeed, the ancient formulas are a pain with today’s very large memory systems (usually you have to limit some tunables in large-memory HP-UX and Solaris boxes, otherwise some parameters get out of control).

I understand that making OSes truly self-tuning is not here yet, nor will it be for a while (64-bitness has taken away some of the pain though, at least in Windows). In the interim, there are better ways to approach the problem. My suggestion: Modernize the formulas that build the tunables and use simple AI techniques like Expert Systems. At installation time, benchmark the hardware and ask the user what will the server be running? OK, so if the answer is a web server, under what conditions? How many users? And so on. Admins are far more likely to know the answers to those questions than “how many open file handles do you think you’ll need?”

Based on the answers and the benchmark results, the system should either tell you what you want is possible, or bitch.

If the box is to be serving double-duty (or quintuple, in some cases), the wizard should check and see if the tunings will conflict and, if not, tune the whole box so that it can accommodate all the applications.

If you’re creating a filesystem, what will the intended use be? The defaults for almost all filesystems are wrong! One size fits only the people that have that size. The problem is that, once you’ve put in several TB on filesystems someone built with the default parameters, changing them is almost impossible: you have to take a backup, destroy the filesystems, rebuild them then restore the data. Which could have been avoided if, say, maybe not the OS but at least Oracle had the smarts to query the FS and figure out it’s using insufficient log and block sizes and that performance will suck. At which point it should puke and tell you “sorry, this is sub-optimal, either do such-and-such to fix it or continue anyway at your peril”. But of course you’re using raw disks for Oracle, right? Right?

Or take the example of Logical Volume Managers. They are cool, yes. They can work great. They will also let you do insane things such as create multiple LVs and stripe them, even if they’re on the same physical disk! The checks that should have been performed are so ridiculously simple it boggles the mind.

HP kinda started doing something like this a while ago – look at the templates in SAM, you can apply 2-3 different (useless) templates based on what the box will be doing that will affect a few tunables. HP-UX is guilty of needing the most tuning of any current OS I can think of, BTW (It also pays great dividends if you know what you’re doing, I took a Superdome to 2x the I/O performance once, felt proud but it took a lot of effort and research that could have been avoided).

Seems like the intelligence that would make our lives easier is like the proverbial hot potato: always someone else’s problem.

I know it’s a tall order: the whole solution would rely on much deeper interoperability between the various components than we’re used to. But I think the end result would be worth it.

In the meantime, if you have to do it all yourself, at least use common sense and have some golden OS builds that are each good for a different use, then just replicate them as needed.

Anyway, all this is aggravating my hemorrhoids (I call them The Grapes of Wrath), better stop now.



2 Replies to “We need more wizards!”

  1. Another great rant D, although I consider the technology posts to be filler in between steak house reviews. It seems to me that the O/S vendors are in a catch-22. On one hand there is enormous pressure to keep the O/S lean and mean to mitigate security risks and optimize performance. On the other hand users are looking for systems that are easy to administer and as you suggest, wizard driven. But each line of code added to an O/S is a potential vulnerability that could be exploited by your token 14 yr old Icelandic hacker. Can the vendors ever win?

  2. Actually my “wizards” (let’s call them intelligence) would IMPROVE performance and security, not make things worse. I’m just talking about putting in more checks and balances that will at least warn you if you’re doing something stupid.

Leave a comment for posterity...