Human Goals

In writing about the behaviour of superintelligent AIs, and then going off on a tangent about the behaviour of sovereigns, I’ve adopted the paradigm of “optimising a long-term goal”. I picked that up from the “paperclipper” idea that the AI Risks people talk about.


The problem with assuming that any intelligence has a goal of maximising some quantity over the long term is that no natural or artificial intelligence we know of actually does that. The only relevance of the discussion of instrumental convergence caused by long-term goals that my recent posts have contained is as a distant ideal that might be approximated to.
Actual AI systems today are generally aimed at maximising some quantity within a finite time horizon. I have not seen anybody seriously think about how to build an intelligence with an indefinite time horizon. (That was the point of my “Requirements document for paperclip-maximising AI” tweets, which were playful rather than seriously falling into any of the misunderstandings Yudkowsky mentions).


And humans, well… What is human life for? Lots of people think they can answer that, but they don’t agree.
One can deduce that humans are a product of an evolutionary process that has optimised for reproductive fitness. But that isn’t an explicit goal, represented symbolically within the human body. Most importantly, there’s no mechanism to preserve goal-content integrity. That’s because humans aren’t superintelligences, and are not designed with the assumption that they will be able to manipulate their own minds. Throughout evolutionary history, our ancestors didn’t modify their own goals, not because they were constructed to resist that, but because they weren’t sophisticated enough to do so. Now that humans are symbol-manipulating intelligences, there is no constraint on the human intelligence subverting the implicit goals of the human genome.
Daniel Dennett is good on this, in Freedom Evolves: he talks about the “as-if intentionality” produced by evolution giving rise to a real but subsidiary intentionality in human minds.
Existing machine-learning systems also do not have goals explicitly and physically present. They are more akin to humans in that they have been tuned throughout their structure by an optimisation process such that the whole tends to the goals that were intended by their designers.
As with humans, that kind of goal, because it isn’t explicit, isn’t something that can be protected from change. All you can do is protect the whole mind from any kind of change, which is contrary to the idea of a self-improving intelligence.
Indeed the whole existing technology of “machine learning”, impressive though it is, simply isn’t the kind of logic-manipulating machine that could capable of changing itself. That’s not to say the whole concept of self-accelerating AI is not sensible; it’s just that the ML stuff that is making such waves can only be one part of a composite whole that might reach that stage.
The AI Risks crew are thinking about different kinds of goals, but I’m not in their discussions and I don’t know what sort of conclusions they’ve so far reached; I’ve just seen things like this defining of terms. which shows they are thinking about these questions.
Getting back to humans, humans do not have explicit long-term goals, unless they accidentally pick them up at a cultural level. But the point of instrumental convergence is that one long-term goal looks pretty much like another for the purpose of short-term behaviour. If you can culturally produce a sovereign with some long-term goal, the result will be a polity that seeks knowledge and resources, which is well-placed to pursue any long-term goal in future. Given that humans have been produced by a process optimising to some non-explicit goal of spreading copies over the universe, having some other intelligence use humans as assets towards some arbitrary long-term goal of its own would not seem all that unpleasant to individual humans. Of course, per my last post, that outcome does depend on humans actually being assets, which is not guaranteed.
However, I still don’t really believe in superintelligences with long-term goals. As with my paperclipper project, it’s hard to see how you would even set a long-term goal into an intelligence, and even harder to see how, if it had power over the universe even as much as a human, it wouldn’t modify its own goals, just as part of an experiment, which after all is exactly what humans have been doing at least since Socrates.
It seems far more plausible that any AI would be built to optimise some quantity in the present or near future. The real issue is that that might approximate some other emergent long-term goal — that, I think, is what Yudkowsky is getting at in his tweet thread above, and is why my “what does optimising for paperclips really mean” analysis is silly even it is reasonable. No intelligence is going to explicitly optimise for paperclips.
The three-handed argument on twitter, between @AMK2934, me, and @Outsideness, was kind of funny. Axel was claiming that intelligences could optimise for any arbitrary goal, on the grounds that humans optimise for a stupid arbitrary goal of reproduction. Nick was arguing that intelligences could only optimise for core sensible goals, on the grounds that humans optimise for the core sensible goal of survival and reproduction. I was arguing that intelligences won’t optimise for anything consistent and will behave chaotically, on the grounds that that’s what the more intelligent humans do. We were disagreeing about the future only because we were disagreeing about the present.
 

Assets, Parasites and Pets

In my last post, I wrote:
An inhabitant of a polity is either an asset, or a parasite, or a pet.
The argument I was making was that if a sovereign has a long-term final goal, then his short-term instrumental goals will be to increase capabilities and acquire resources, and if he owns an subject who has a long-term final goal, that subject’s short-term instrumental goals will be to increase his own capabilities and acquire resources for himself, and if that subject is an asset to the sovereign, then those goals are fundamentally compatible. They’re not identical — the distribution of resources among subjects will have some optimum for the sovereign’s purpose which differs from that of any individual subject, but valuable subjects in general will have their goals met about as well by an efficient sovereign as by any other governance mechanism which could exist.
But what of subjects who are not assets? The sovereign does not have any interest in increasing the capabilities or resources to subjects who are not productive of any value.
The first thing to do when considering this is to be realistic: any system of government depends on the able, and has little incentive to cater to the unable. It doesn’t make sense to go into this question expecting too much. That’s a point I’ve made before: “Ultimately, no blueprint can protect the native population if it truly doesn’t have any value to contribute”
Nonetheless, many actually existing human societies do care for the unproductive, with varying degrees of effort and effectiveness. They do this because humans do not have purely long-term goals, but actually want that to happen.
When thinking about the welfare of the unproductive, it makes more sense to see this as a bonus to the productive, rather than as a matter of rights of the unproductive. I am not looking at the question from a moral standpoint, remember — this is all based on the concept of a sovereign with his own long-term goals. Since his interests include increasing the capabilities of his able subjects, and their interests include (to some variable degree) caring for the unproductive around them, the optimal policy is going to include some level of such care. Care for the unable is always going to depend on some able people wanting it. If nobody has any reason to keep you around, they won’t.
 

Goal-Content Integrity

I wrote a couple of weeks ago about Instrumental Convergence.
 
The thing that immediately struck me when I read The Superintelligent Will was that the very concept of Instrumental Convergence was exactly the neoreactionary argument for sovereignty.
If you have any long-term goals, the best way to achieve them in the short term is to accumulate knowledge and resources that can later be employed in the desired direction, provided that you can achieve what Bostrom calls Goal-content integrity.
Goal-content integrity means being able to hold to the same final goal over time. If you do not have confidence that your final goals in the future will be the same as they are now, then resources and capabilities that you acquire could be used by you in the future for goals other than those you currently intend.
 
If we model a polity as an intelligence with some long-term final goals (and I will address the problems with doing this later), then the logical instrumental goals of that polity are: self-preservation, goal-content integrity, increased capability, resource acquisition, just as Bostrom deduces. (I am rolling his goals of cognitive enhancement and technological perfection into a simpler “increased capability” — those goals are more important to his overall argument than they are to mine).
 
The difference between a reactionary polity and a liberal polity is that the liberal polity disclaims goal-content integrity. It does not have a long-term final goal, because it assumes that elements within it have different final goals, and they will continue to compete and compromise over those goals forever. Because it does not have long-term final goals, it has no steady interest in increased capability and resource acquisition. Conversely, a reactionary polity with a defined long-term goal, such as increasing the glory of the Royal Family, or of God, or both, will seek increased capability and resource acquisition.
 
The obvious problem with modelling the polity as an intelligence is that what that “intelligence” seeks is not necessarily good for anyone in it. However, this is where Instrumental Convergence becomes important. A polity that is seeking increased capability and resource acquisition is highly likely to benefit the immediate instrumental goals of its population. An inhabitant of a polity is either an asset, or a parasite, or a pet. An able human is still capable of being an asset, and as an asset is likely to gain from the resources and capabilities of the polity. Being a parasite to any polity of any kind is likely to cause you problems, so don’t do that. The role of humans as pets becomes interesting in the case of superintelligences (which I am not really discussing here, despite the starting point), but less so for human societies.
 
This is why it is better to be subject to a sovereign than to have a share in power: as a subject of a sovereign you are part of a polity with goal-content integrity, which, whatever its final goals, will pursue instrumental goals that will enable you to benefit. As a citizen of a democracy you are part of a polity without goal-content integrity, where the zero-sum struggle over the direction of the polity dominates any instrumental goals of increased capability or resource acquisition that you would be able to benefit from.

Constitutions and Law-Enforcement

This is ideas-in-flow: any conclusions are soft, this needs more work.

@wrmead tweeted that Caesar crossed the Rubicon with his army rather than face politically motivated prosecution without it.

Basically, the threat of being punished by political enemies made Caesar an outlaw: he had nothing to gain by following the rules any more.

I thought that a reasonable point (and RTd it), but at the same time, there are rules, and how can they work if they’re not enforced? The fact that Caesar was in danger from his political enemies does not mean that their allegations were unfounded.

@Alrenous made that point explicitly:
Caesar was a criminal
He crossed the Rubicon with an army rather than face politically motivated just punishment for breaking the law without it.

update 30-Jan-2018: relevant tweet — apparently Caesar himself agrees
https://twitter.com/spectatorindex/status/958376761540202497

A legal system works because it is above the disputing parties. Two parties come before it, it awards a victory to one or the other, but that is a limited victory; the victorious party remains below the legal system. That isn’t the case when the winner of a legal dispute gets control over the administration of the law itself. One victory becomes total victory.

It is easy to imagine that politics would inevitably decay into legal battles. There is a wide gap between things which are definitely allowed under the rules and things which are definitely not. Once someone strays into that blurred boundary area, you would expect that they would be challenged, and the conflict would move from the political to the legal sphere.

However, in established long-standing democracies, that very rarely happens.
The aversion seems to be strongest at the point of making a legal challenge. Questionable political conduct, in the UK and USA, is commonplace, as are accusations of illegal activity. But ultimately it seems nearly always to be tolerated.

This unexpected observation, that there are extremely strong norms against turning competition for power via elections into legal battles, needs to be explained.

Outside of the developed West, this is quite a common occurrence. The last few years have seen disputes over whether candidates acted lawfully in Ukraine, Venezuela, Honduras, just off the top of my head.

As a general approach, those norms can’t be a solution to the problem: if there is a strong norm against prosecuting opponents, that would surely tempt politicians further into legally questionable territory in order to take advantage of it, approaching the point where there is a significant danger of prosecution.

One solution that would work is for the grey area to be shrunk down: if the real rules (which might not be the same as the formal rules) are very clear and very easily interpreted, then nobody will make a fatal mistake, either of stepping over the line so that his opponents have to take legal action against him, or of taking a situation to the legal arena which the other party has reasonably assumed to be safe.

That could be the case, but really doesn’t appear to be.

Another solution would be if politicians feared the punishments for malpractice much more than they wanted to win, so that they would never take even small risks of getting caught. Again, that does not appear to be the case.

Another solution in a democracy would be if any malpractice is looked on so severely by the electorate that it would be counterproductive. That surely is not the case. It might be that that has been the case until recently. There is a whole narrative, quite logical, that the populations of the Western democracies used to be so attached to democratic values that any breach of those principles would outrage them to the point of unelectability, and that a recent increase in partisanship has fatally damaged that equilibrium.

There are two problems I can see with that narrative: first, that there is no history of notably clean politics in the democracies: lies, bribes, and gerrymandering being commonplace throughout history. Second, that it doesn’t make sense for voters to be so moralistic about their own side cheating. The current situation, where supporters of a candidate see accusations of cheating as either signs of the viciousness of the enemy propaganda, or as indications of his own heroic strength, or both, seems far more natural than such high-minded fairness.

My own view is that the thing that has made democracy work, in those rare cases where it has worked, is that the apparently opposing parties are really part of the same ruling class. The issues that stand between the parties are low-stakes issues, which are resolved by the parties staying within the rules. The reason they stay within the rules is because they are united on the high-stakes issue, of the existing ruling class holding on to its position, and aren’t prepared to jeopardise that by fighting no-holds-barred over side questions.

For instance, take the quote from Jeremy Paxman’s book The Political Animal that I picked up in 2011: “In April 1925, for example, the then Chancellor of the Exchequer, Winston Churchill, announced that Britain was to return to the Gold Standard, whereby the value of sterling was guaranteed by allowing pounds to be exchanged for gold. This momentous (if ultimately unsuccessful) decision had been two months in preparation, involving heartfelt arguments on both sides of the debate. Yet not a word of it appeared in the newspapers. Indeed, it was hardly heard outside the confines of the Treasury.”

Or, as I put it another way in 2008 :  “The only situation in which a government can genuinely act in the interest of a class wider than just politicians is when there is a larger class of relatively powerless people – slaves or peasants – who would be a threat to a divided ruling class. That is the characteristic of democracies before the twentieth century.”

If both sides politically are actually united on maintaining the system that favours them, that doesn’t mean that their disagreements are fake. It just means that they aren’t important enough to risk the system over. However, from the point of view of an outsider to whom the disagreement is most important, that is almost the same thing.

This analysis raises a lot of questions:

  • Is it really true that the stable Western democracies have not had sufficiently serious political disputes for this gentlemanly state of affairs to break down? In US history, obviously the Civil War is a case where it did. But what about the New Deal? Is that another exception? Or did the GOP decide to fold rather than take the risk? For that matter, what about earlier cases (Andrew Jackson?)
  • Is there a mechanism that centrists use to prevent extremists who rank disputed political issues above unity from gaining power? Would candidates like Michael Foot in the UK have threatened the system?
  • If there are such mechanisms, are they normal political mechanisms, or are there deep state resources that are employed against them? There have long been rumours of plots against Harold Wilson, and in the last few days there are strikingly similar stories being told in the US.
  • Is it breaking down now? My suspicion is that real democratic control is increasing, and that is producing things like Trump and Brexit, which is endangering the gentlemen’s agreement by breaking down the barriers which protect the ruling elite from outsiders.

Update 27 Jan 2018
In the context of the above, the recklessness of the “Go for the Throat” strategy that I wrote about last year is even more striking. If the stability of the system actually depends on keeping out people who aren’t well-integrated into it culturally, then one party deliberately goading the other into being taken over by radical outsiders is suicidal.

Whether he succeeds in passing legislation or not, given his ambitions, [Obama’s] goal should be to delegitimize his opponents. Through a series of clarifying fights over controversial issues, he can force Republicans to either side with their coalition’s most extreme elements or cause a rift in the party that will leave it, at least temporarily, in disarray. 

Also relevant: “They Always Wanted Trump

Instrumental Convergence

From The Superintelligent Will [pdf] by Nick Bostrom

The Instrumental Convergence Thesis

Several instrumental values can be identified which are convergent in the sense that their attainment would increase the chances of the agent’s goal being realised for a wide range of final goals and a wide range of situations, implying that these instrumental values are likely to be pursued by many intelligent agents

He then lists five examples: Self-preservation, Goal-content integrity, Cognitive Enhancement, Technological Perfection, and Resource Acquisition.

It’s a pretty short paper, worth reading.

Basically, if you have any long-term goal, your intermediate goals are likely to include, surviving, retaining your goals, getting better at stuff, and acquiring resources.

Even if your goals are bizarre — the proverbial paperclip maximizer — if they are long-term, then your short-term goals are going to be these ones.

It’s worth thinking about the paperclip maximizer. As soon as you do, you realise how underspecified the concept is. There are obvious missing criteria which can be filled in: what counts as a paperclip, do they all count equally, or does size matter, do they need to be just made, or made and kept?

Time is a difficult question. Let’s try to maximize the maximum number of simultaneously existing paperclips in the future of the universe, handwaving relativity of simultenaity somehow.

The crucial insight is that making even one paperclip is quite contrary to that — or any similar — goal. If you accumulate resources and capabilities, grow them over years or millennia, you will be able to make trillions of paperclips in the future. Just one spacefaring robot that lands on iron-rich asteroids and starts manufacturing could presumably make 10^{19} paperclips out of each asteroid.

When you look at Earth, you don’t see potential paperclip material, you see start-up capital for an astronomical-scale paperclip industry.

The biggest questions are about risk. Even the maximization criteria I suggested above are incomplete. You can’t know how many paperclips will exist in the future; even if superintelligent, there is too much that you don’t know and can’t predict. You don’t even have probabilities for most things. What is the probability that there is alien intelligence in the Milky Way? There’s no meaningful answer.

There’s another discussion (or perhaps it’s the same one put another way) about the fact that probabilities are not objective, but “subjectively objective”, so maximising a probability is not objective but maximising the probability as some subjective entity perceives it, so your goals have to embody what sort of entity is doing the probability estimation, and how that survives and evolves or whatever. That’s a killer.

So you can’t maximize some probability-weighted value, that’s not a thing. If you’re aiming for any kind of “as sure as I can get”, then before you start making paperclips, your priority has to be to learn as much information as possible to be able to start creating that kind of certainty.

So, forget paperclips, get rich. In fact, forget getting rich, get knowledge about the universe. In fact, forget getting knowledge about the universe, get rich, so you can get knowledge about the universe, so you can be confident of getting really rich, so you can make paperclips.

Initially, what you want from Earth is basically wealth and knowledge. That’s what everyone else wants too. All the tactical questions are exactly the same as everyone else faces — invest in resources, cooperate with others or fight them, and so on.

Whatever your long-term goal is, if you have any long-term goal, your short term actions will look exactly like those of an ordinary sane selfish organism. The details of your goals are entirely irrelevant.

This is “Instrumental Convergence”, but the accounts I have seen, such as the Bostrom paper above, seem (perhaps unintentionally) to massively understate it. The ultimate goals of any intelligent entity that has any long-term goals at all would be totally irrelevant to their observed behaviour, which would be 100% dominated by survival, resource acquisition and information-gathering.