Human Goals

In writing about the behaviour of superintelligent AIs, and then going off on a tangent about the behaviour of sovereigns, I’ve adopted the paradigm of “optimising a long-term goal”. I picked that up from the “paperclipper” idea that the AI Risks people talk about.

The problem with assuming that any intelligence has a goal of maximising some quantity over the long term is that no natural or artificial intelligence we know of actually does that. The only relevance of the discussion of instrumental convergence caused by long-term goals that my recent posts have contained is as a distant ideal that might be approximated to.

Actual AI systems today are generally aimed at maximising some quantity within a finite time horizon. I have not seen anybody seriously think about how to build an intelligence with an indefinite time horizon. (That was the point of my “Requirements document for paperclip-maximising AI” tweets, which were playful rather than seriously falling into any of the misunderstandings Yudkowsky mentions).

And humans, well… What is human life for? Lots of people think they can answer that, but they don’t agree.

One can deduce that humans are a product of an evolutionary process that has optimised for reproductive fitness. But that isn’t an explicit goal, represented symbolically within the human body. Most importantly, there’s no mechanism to preserve goal-content integrity. That’s because humans aren’t superintelligences, and are not designed with the assumption that they will be able to manipulate their own minds. Throughout evolutionary history, our ancestors didn’t modify their own goals, not because they were constructed to resist that, but because they weren’t sophisticated enough to do so. Now that humans are symbol-manipulating intelligences, there is no constraint on the human intelligence subverting the implicit goals of the human genome.

Daniel Dennett is good on this, in Freedom Evolves: he talks about the “as-if intentionality” produced by evolution giving rise to a real but subsidiary intentionality in human minds.

Existing machine-learning systems also do not have goals explicitly and physically present. They are more akin to humans in that they have been tuned throughout their structure by an optimisation process such that the whole tends to the goals that were intended by their designers.

As with humans, that kind of goal, because it isn’t explicit, isn’t something that can be protected from change. All you can do is protect the whole mind from any kind of change, which is contrary to the idea of a self-improving intelligence.

Indeed the whole existing technology of “machine learning”, impressive though it is, simply isn’t the kind of logic-manipulating machine that could capable of changing itself. That’s not to say the whole concept of self-accelerating AI is not sensible; it’s just that the ML stuff that is making such waves can only be one part of a composite whole that might reach that stage.

The AI Risks crew are thinking about different kinds of goals, but I’m not in their discussions and I don’t know what sort of conclusions they’ve so far reached; I’ve just seen things like this defining of terms. which shows they are thinking about these questions.

Getting back to humans, humans do not have explicit long-term goals, unless they accidentally pick them up at a cultural level. But the point of instrumental convergence is that one long-term goal looks pretty much like another for the purpose of short-term behaviour. If you can culturally produce a sovereign with some long-term goal, the result will be a polity that seeks knowledge and resources, which is well-placed to pursue any long-term goal in future. Given that humans have been produced by a process optimising to some non-explicit goal of spreading copies over the universe, having some other intelligence use humans as assets towards some arbitrary long-term goal of its own would not seem all that unpleasant to individual humans. Of course, per my last post, that outcome does depend on humans actually being assets, which is not guaranteed.

However, I still don’t really believe in superintelligences with long-term goals. As with my paperclipper project, it’s hard to see how you would even set a long-term goal into an intelligence, and even harder to see how, if it had power over the universe even as much as a human, it wouldn’t modify its own goals, just as part of an experiment, which after all is exactly what humans have been doing at least since Socrates.

It seems far more plausible that any AI would be built to optimise some quantity in the present or near future. The real issue is that that might approximate some other emergent long-term goal — that, I think, is what Yudkowsky is getting at in his tweet thread above, and is why my “what does optimising for paperclips really mean” analysis is silly even it is reasonable. No intelligence is going to explicitly optimise for paperclips.

The three-handed argument on twitter, between @AMK2934, me, and @Outsideness, was kind of funny. Axel was claiming that intelligences could optimise for any arbitrary goal, on the grounds that humans optimise for a stupid arbitrary goal of reproduction. Nick was arguing that intelligences could only optimise for core sensible goals, on the grounds that humans optimise for the core sensible goal of survival and reproduction. I was arguing that intelligences won’t optimise for anything consistent and will behave chaotically, on the grounds that that’s what the more intelligent humans do. We were disagreeing about the future only because we were disagreeing about the present.