Initialisation at declaration considered harmful

Suppose you have a variable x.

int x;

Hello x.

Now suppose you decide that under some circumstances x should have a particular value.

if (some_circumstances) x = particular_value;

And later on x’s good buddy y wants to have its own value based on x’s value.

int y = f(x);

Hey y, how’s it going? What’s that? You say you don’t feel so good?

Oh dear. It looks like somebody’s coming down with a touch of the Undefined Behaviour. Perhaps some_circumstances wasn’t the only case we should have addressed, here.

Conventional wisdom says you should avoid this situation by always initialising your variables when you define them. Ideally you do this by declaring them only when you know what they should be. But sometimes you have only partial information when you want to put a value in there, and so in the alternate case you can only make something up:

int x = 0;
if (some_circumstances) x = particular_value;
int y = f(x);

But what if your intention was not to set y to f(0)? What if the real bug was in failing to consider another case and come up with a suitable result in that case as well? What if x was actually a uid_t? Should a uid be initialised to zero as a “safe default” in the case of logic bugs?

Well, you could initialise x with a value so absurd that the mistake was bound to be highly visible in some way or other. Good choices are signalling NaNs, nullptr, etc., or something you’ll catch in an assert() eventually (if you remember, and if you have the test coverage). That’s problematic if your type can only represent legal and appropriate values (very often the case for DSP work).

You could use a bigger type as a temporary, or use std::optional<> which includes an explicit flag saying whether or not the variable has been initialised. But these require that extra checks be manually implemented before the variable is used. Otherwise they’ll likely produce silent failures of their own. And checks might not be put in all the necessary places, because they’re a manual effort.

The thing is, though, leaving the variable uninitialised is setting it to an illegal value which the compiler will try to prove cannot escape:

int x;  // will definitely get overwritten
if (some_circumstances) x = particular_value;
if (some_other_circumstances) x = different_value;
if (unusual_circumstances) x = spooky_value;
int y = f(x);

Ideally, if (some_circumstances || some_other_circumstances || unusual_circumstances) isn’t provably true then the compiler will gripe about this and you’ll have to revisit the code and make it right. This is most valuable if the code was clean before you made changes and afterwards this warning suddenly turns up.

Sadly, Clang and GCC really only care if they’re going to produce an undefined value, and with optimisations enabled most of these cases are obviated by replacing predicates with constants. That might cover security vulnerabilities but it’s no help with logic bugs. To get the job done properly you need to run Clang with --analyze, or use your own favourite static analyser.

Clearly the compiler’s still not going to get all of them, and the static analyser might miss something too, so being the diligent you that you are you’ll hopefully catch the remaining cases when you run your unit tests with -fsanitize=memory.

But if you do initialise the variable before you know what should be in that variable, then those checks will never work. Consequently you can introduce bugs which cause the initialiser you chose (before you knew what the value should be) to become the final value, and neither the compiler nor the sanitiser will be able to tell you that you’ve done so. You’d have been better off knowing you just broke something, but instead you’ll just get that “safe” value you initialised with.

Modern tooling has made an uninitialised variable the implicit signalling illegal state. But it’s also long-established bad style, so people have put time and effort into hiding bugs which would have been surfaced by the tools had they not tried to improve their code.

It’s unfortunate that there’s no consistent way to explicitly declare a variable as having an illegal state which should raise an error if it’s used. All we have is well-known ad-hoc solutions like nullptr, NAN, std::numeric_limits<T>::signaling_NaN, maybe T::end(), etc..

I would prefer explicit syntax for “I don’t know yet” initialisers which still allow the tools to do their job but can drop in default fill values when the tools reach their limits. Like C++26’s erroneous behaviour, but made explicit so as to stave off those generic “uninitialised variable” warnings. Perhaps name it undecided<T>{} or uncommitted<T>{} or provisional<T>{}, with an optional value argument if you don’t want to leave that choice to the implementation, reflecting that the developer hasn’t chosen a value and any attempt to read it before it changes would be a mistake, but without implying that it could be uninitialised.

int x = uncommitted<int>{};
if (some_circumstances) x = particular_value;
if (some_other_circumstances) x = different_value;
if (unusual_circumstances) x = spooky_value;
int y = f(x);  // invoke C++26 erroneous behaviour as needed

Ideally the compiler or static analyser would pick up any oversights in that logic. If not, -fsanitize=memory might pick it up provided you have a test case that covers it. If not, then a default value is inserted as chosen by uncommitted<int>{}, or the value you specify if you choose to do so (even though you’ve clearly never tested it). One might expect uncommitted<float>{} to choose a signalling NaN and any pointer type to choose nullptr.

C++26 might achieve that if you leave the variable uninitialised at definition, but that just looks like a mistake, and it’s a landmine if you don’t have your compiler configured appropriately.

Additionally, if you could be explicit, you can be more explicit about other things, like:

int f(Result& result, int arg) {
    result = uncommitted<Result>{};
    // ...
    result = work_in_progress;
    // ...
    if (accident_happened) {
        result = uncommitted<Result>{};
        return -1;
    }
    // ...
    return 0;
}

And let the tools ensure that result is left untouched when it’s in an undefined state.