Microsoft Bugs

A question at the end of an article on how the Microsoft X-Box security (designed to prevent unauthorised code being run) was broken:

512 bytes is a very small amount of code (it fits on a single sheet of paper!), compared to the megabytes of code contained in software like Windows, Internet Explorer or Internet Information Server. Three bugs within these 512 bytes compromised the security completely – a bunch of hackers found them within days after first looking at the code. Why hasn’t Microsoft Corp. been able to do the same? Why?

It’s a good question. There are a few plausible explanations:

  1. The design team were aware that the task of making it secure was an impossible one, and put just enough effort in to show willing, or to qualify as an “access control system” for legal purposes.
  2. The design was done in an insane rush, due to last-minute architectural compromises or general managerial incompetence.
  3. One or more of the designers secretly felt that the more the customer could do with the device, the better it would be, and in effect sabotaged a feature which had the purpose of limiting what the customer could do with it.

But my favourite theory is quality control. The biggest obstacle I face as a programmer to producing high quality software is the system of controls intended to make sure the software I produce is of high quality.

The major mechanism is obtaining approvals from people who have a vague idea of what the software is supposed to do, no idea at all of how it is supposed to do it, and little interest in the whole process. Other mechanisms involve using, or avoiding, particular tools or techniques.

What they all have in common is that they require me to subordinate my own engineering choice for some one else’s, quite likely someone who not only has less knowledge of the specific question, but of the relevant general principles. This extends even to questions of who else to involve: if the bureaucracy says I have to get sign-off from person A, then person A gets to check the product ahead of person B, even if, left to myself, I would choose to ask person B to check it in preference, due to person B’s greater expertise or interest.

The bureaucrats would say it is a question of trust – the checks are in place so that management can take direct responsibility for the quality of the product, rather than just taking my word for it. I do not find this at all offensive; it is a perfectly reasonable thing for them to want. The problem is that it doesn’t work. It is always possible to “go through the motions” of doing the procedures, but there is almost no value in it. Getting it right always takes a mental effort, a positive commitment. I don’t blame them for not trusting me to do it, but they don’t have any choice.

The general ineffectiveness of quality control policy is masked by the usefulness of systematic testing. It is possible for a less-involved person to ask for, and check, tests – particularly regression tests on a new version of a product – and achieve significant quality benefits from doing so. As testing of this kind is generally part of the general battery of ceremonial procedures, the uselessness of all the others is less obvious than it would otherwise be. But there are many failures that this kind of testing doesn’t catch (and, therefore, which over-emphasis on this kind of testing will increase the occurence of), and practically all security issues are in this category.

I have no knowledge of the quality-control regime at Microsoft: I’m just speculating based on my observation that a ceremony-heavy process can produce bad code of a kind that would be almost inexplicable otherwise. In this case, there are other reasonably plausible explanations, which I already listed.

(via Bruce Schneier)

(See also LowCeremonyMethods)