Aside on Software Bloat

October 17, 2019


Comments Off on Aside on Software Bloat

I fell yesterday into the sad bitching about how big and slow software has become. This is a very old complaint — the EMACS editor used to be mocked as “eight megabytes and constantly swapping” back when eight megabytes was a huge amount of memory, but that rounds down to zero pixels on a graph of memory utilisation on a modern laptop.

I retailed the usual whines about electron and so on, but really any disagreements are at the margin: the real underlying reasons for software bloat are, unfortunately, good reasons.

Here’s a more interesting illustration: I recently watched this video. It’s a one-hour presentation by a Microsoft developer explaining MS’s implementation of the new C++ charconv header.

This is a library for converting numbers to and from decimal format. Computers internally work with fractions or large numbers in a binary floating-point format, so you have to be able to convert that format to and from a string of decimal digits.

All computers have to do that. My ZX81 did it 27 years ago (though its predecessor the ZX80 couldn’t — it worked only with whole numbers). It was part of the 8K of software built into the machine, along with the full floating-point mathematics support in software.

The new charconv library the Microsoft guy was presenting contains 5300 lines of C++, taking 221K of code and another 400K of data tables.

And — to make it clear — it’s awesome. I was glued to the one-hour video on what they’ve done. The clever bit is getting the right number of decimal digits.

The technical problem is that a fractional decimal number usually doesn’t convert exactly to a binary number. Therefore when you convert from decimal to binary — to do any calculations with the number — you’re getting a slightly different number. That’s OK. But then when you convert back from binary to decimal, you can get an exact decimal representation of the binary approximation of the original decimal number, so it’s a bit different to what you started with. That’s quite annoying. It can even cause program bugs.

The current C++ language standard says the new functions to convert binary to decimal should be able to round to the shortest decimal representation that will exactly convert back to the same binary value. That’s difficult to work out, and really really difficult to work out quickly. In fact a new method of doing it was produced by a guy called Ulf Adams at Google just in 2018, and the Microsoft team have implemented that algorithm for their standard library.

This is all very cool. But the relevance to my point is that when I, in a C++ program, decide to output a floating point number in a decimal form, maybe to save into a database or communicate to another program, and I use this standard to_chars function, I’m invoking all this mass of ingenious code to do the conversion. I may or may not notice that the rounding is now perfect in a way it never was before from 1982 to 2018. I probably won’t notice the 600K of library code that’s being used by my program. If I hadn’t happened to see this video, I would never have had any idea about any of this.

That’s for printing a number! It seems close to the simplest thing a computer program can do. Everything else in my program, dealing with text, or graphics, or networking, or anything has gone through this kind of improvement, often many times. Sometimes your program is getting real benefit from the improvements. Sometimes it’s getting the effect of the improvement, but they don’t make any useful difference for you. Sometimes you aren’t using the new functionality at all, but it still gets included when your program runs. That’s slightly unfortunate, but simplicity is valuable, and grabbing big chunks of functionality is simpler than finely selecting them.

The bottom line is that everything has a cost, even slimming down software, and if you insist on using a low-end 6-year-old computer like I do then it’s not worth most developers’ time to cater to you. I do think there is too much bloat, but it’s about tradeoffs at the margin; there will always be bloat, and that’s OK.

Recent Comments