Tag: technical


“End-to-end encryption”

1st August 2017

Uncategorised

Comments Off on “End-to-end encryption”


The question of regulating encrypted communication has come up again. I was going to write again about how the politicians don’t understand the technologies, and they probably don’t, but if they did, what would they do about it?  The details are too complex to debate on TV news. What percentage of the viewing public even knows what public-key encryption is?

Politicians often talk as if “end-to-end encryption” is a technology, and one which is rare and might practically be banned. There are then huge arguments about whether such banning would be good or bad, which leave me somewhat bemused.

Of course, “end-to-end encryption” is no more a technology than “driving to a friend’s house” is a technology. Cars and roads and driving are technologies, driving to a friend’s house, or to a restaurant, or to work, are social or economic practices that make use of the technology.

Similarly, sending encrypted messages is a technology. sending “end-to-end” encrypted messages is not a technology, it’s just sending encrypted messages to an intended end recipient. Whether a particular message is “end-to-end” encrypted depends on who the end is.

The soundbites talk about one kind of messaging: messages sent person-to-person from a sender to a recipient via a service provider like Whatsapp, Microsoft or Google.

In 2017, most data sent over the internet that is at all personal is encrypted. Huge efforts have been made over the last five or so years to get to this stage, yet the debates about encryption have not even touched on the fact. Data in motion seems to be invisible. The encryption used to send the messages is very strong; again, a few years ago, there were quite a few bugs in commonly used implementations, but efforts have been made to find and fix such bugs, and while there are likely to be some left, it is plausible that nearly all such encrypted messages are unbreakable even by the most powerful national security organisations.

However, the way most of these services work today is that the sender makes a connection to the service provider and authenticates himself with a password. The Service Provider also authenticates itself to the sender with a certificate, though that’s mostly invisible. The sender then sends their message encrypted to the Service Provider, which decrypts it and stores it. Later (or simultaneously) the recipient makes a connection to the Service Provider the same way, and the Service Provider encrypts the message and sends it to the recipient. This is fundamentally the same whether we are talking about messaging apps, chat, or email, and whether the devices used are computers, phones or tablets.

Anyway, call this method 1. Service Provider Mediated

A few of these services now have an extra feature. The sender’s app first encrypts the message in a way that con only be decrypted by the recipient, then encrypts it again to send to the Service Provider. The Service Provider decrypts one level of encryption, but not the second. When the recipient connects, the Service Provider re-encrypts the already encrypted message and sends to the recipient. The recipient decrypts the message twice, once to get what the Service Provider had stored, and then again to get what the sender originally wrote.

That is why the politicians are talking about Whatsapp, Telegram and so on.

This is method 2. Service Provider Mediated, with provided end-to-end encryption

An important question here is who keeps track of the encryption keys. If the Service Provider has that responsibility, then it can support interception by giving the sender the wrong encryption key; one that it or the government can reverse. If the sender keeps the recipient’s encryption key, that is not possible, the Service Provider receives no messages that it is able to decrypt.

Going back to method 1, if the Service Provider doesn’t guide the end-to-end encryption, it’s still possible to add it with special software for the sender and recipient. This is awkward for the users and has never caught on in a big way, but it’s the method that the authorities used to worry about, decades back.

Method 3. Service Provider Mediated with independent end-to-end encryption

There are plenty more. The sender connects to the Service Provider and indicates, via an encrypted message, what recipient they want to message. The Service Provider replies with an endpoint that the sender can connect to. The sender then directly connects to the recipient and transmits an encrypted message, which the recipient decrypts.

This peer-to-peer messaging isn’t fundamentally different in technology from the end-to-end encrypted scenario. In both cases the actual networking is “store-and-forward”: An intermediary receives data, stores it, and then transmits it to either another intermediary or the recipient. The only difference is how long the data is stored from; a typical router will store the data for only a fraction of a second before transmitting and deleting it, whereas a Service Provider’s application server will store it at least until the recipient connects to retrieve it, and quite likely will archive it permanently. (Note there are regulations in some jurisdictions that require Service Providers to archive it permanently, but that applies to their application servers and not to routers, which handle orders of magnitude more data, most of which is transient).

It’s not always obvious to the user whether a real-time connection is mediated or not. Skype calls were originally peer-to-peer, and Microsoft changed it to mediated after they bought Skype. The general assumption is that this was at the behest of the NSA to enable interception, though I’ve not seen any definitive evidence.

Another thing about this kind of service is that the Service Provider does not need nearly as much resource as one that’s actually receiving all the messages their users send. There could be a thousand different P2P services, in any jurisdiction. With WebRTC now built into browsers, it’s easy to set one up.

Method 4. Service Provider directed peer-to-peer.

It’s not actually hard to be your own Service Provider. The sender can put the message on his own server, and the recipient can connect to the sender’s server to receive it. Or, the sender can connect to the recipient’s server, and send the message to that. In either case, the transmission of the messages (and it’s only one transmission over the public internet, not two as in the previous cases) will be encrypted.

As with method 2,  the Service Provider might manage the encryption keys for the user, or the user’s app might retain encryption keys for the correspondents it has in its directory.

The software is all free and common. Creating a service requires a little knowledge, but not real expertise. I estimate it would take me 90 minutes and cost £10 to set up a publicly-accessible email, forum and/or instant messaging service, using software that has been widespread for many years, and that uses the same secure encryption that everything else on the internet uses. Whether this counts as “end to end encryption” depends entirely on what you count as an “end”.  If I want the server to be in my house instead of a cloud data centre in the country of my choice, it might cost me £50 instead of £10, and it’s likely to have a bit more downtime. That surely would make it “end-to-end”, at least for messages for which I am either the sender or the recipient.

This is getting easier and more common, as internet speeds improve, connected devices proliferate, and distrust of the online giants’ commercial surveillance practices grows. There have been one or two “server in a box” products offered which you can just buy and plug in to get this kind of service — so far they have been dodgy, but there is no technical barrier to making them much better. Even if such a server is intended and marketed simply as a personal backup/archive solution, it is nevertheless in practice a completely functional messaging platform. The difference between an application that saves your phone photos to your backup drive and a full chat application is just a little bit of UI decoration, and so software like owncloud designed to do the first just throws in the second because it’s trivial.

That is Method 5. Owned server

There are several variants covered there. The user’s own server might be on their own premises, or might be rented from a cloud provider. If rented, it might be a physical machine or a virtual machine. The messages might be encrypted with a key owned by the recipient, or encrypted with a key configured for the service, or both, or neither. Whether owned or rented, the server might be in the same country as the user, or a different country. Each of these makes a significant difference from the point of view of an investigating agency wanting to read the messages.

Investigating authorities aren’t only concerned with encryption, though, they also want to know who is sending or receiving a message, even if they can’t read it. This could make the politicians’ opposition to mediated end-to-end encryption more reasonable: the Service Providers allow users to connect to their servers more or less anonymously. Using peer-to-peer or personal cloud services, the data is secure but the identity of the recipients of messages is generally easier to trace. The Service Providers give the users that the authorities are interested in a crowd of ordinary people to hide among.

It’s easy to sneer at Amber Rudd, but can you imagine trying to describe a policy on this in a TV interview, or in the House of Commons? Note I’ve skipped over some subtle questions.

Even if you could, you probably wouldn’t want to. Why spell out, “We want to get cooperation from Facebook to give us messages, but we’re not stupid, we know that if the terrorists buy a £100 off-the-shelf NAS box and use that to handle their messages, that won’t help us”?

Summary: kinds of messaging practice

Service Provider mediated non-end-to-end

Data accessible to authorities: with co-operation of Service Provider
Identity accessible to authorities: IP addresses obtainable with co-operation of Service Provider but can be obscured by onion routing / using public wifi etc
User convenience: very convenient

Service Provider mediated end-to-end

Data accessible to authorities: No
Identity accessible to authorities: IP addresses obtainable with co-operation of Service Provider but can be obscured by onion routing / using public wifi etc
User convenience: very convenient

End-to-end layered over Service Provider (e.g. PGP mail)

Data accessible to authorities: No
Identity accessible to authorities: IP addresses obtainable with co-operation of Service Provider but can be obscured by onion routing / using public wifi etc
User convenience: very inconvenient, all users must use special software, do key management

Peer-to-peer
Data accessible to authorities: No
Identity accessible to authorities: IP addresses directly accessible by surveillance at either endpoint or at ISP
User convenience: fiddly to use, need to manage directories of some kind

Personal Internet Service (Hosted)


Data accessible to authorities: With the cooperation of the host, which could be in any country
Identity accessible to authorities: IP addresses directly accessible by surveillance at either endpoint or at ISP
User convenience: Significant up-front work required by one party, but very easy to use by all others. Getting more convenient.

Personal Internet Service (on-site)

Data accessible to authorities: If they physically seize the computer
Identity accessible to authorities: IP addresses directly accessible by surveillance at either endpoint or at ISP
User convenience: Significant up-front work required by one party, but very easy to use by all others. Getting more convenient.
Appendix: Things I can think of but have skipped over to simplify
  • Disk encryption — keys stored or provided from outside at boot
  • Certificate spoofing, certificate pinning
  • Client applications versus web applications 
  • Hostile software updates
  • Accessing data on virtual servers through hypervisor

Democracy and Hacking


The New York Times has published a long analysis of the effects of the hacking of Democratic Party organisations and operatives in the 2016 election campaign.

The article is obviously trying to appear a balanced view, eschewing the “OMG we are at war with Russia” hyperbole and questioning the value of different pieces of evidence. It does slip here and there, for instance jumping from the involvement of “a team linked to the Russian government” (for which there is considerable evidence) to “directed from the Kremlin” without justification.

The evidence that the hackers who penetrated the DNC systems and John Podesta’s email account are linked to the Russian Government is that the same tools were used as have been used in other pro-Russian actions in the past.

*Update 4th Jan 2017: that is a bit vague: infosec regular @pwnallthethings goes into very clear detail in a twitter thread)

One important consideration is the sort of people who do this kind of thing. Being able to hack systems requires some talent, but not any weird Hollywood-esque genius. It also takes a lot of experience, which goes out of date quite quickly. Mostly, the people who have the talent and experience are the people who have done it for fun.

Those people are difficult to recruit into military or intelligence organisations. They tend not to get on well with concepts such as wearing uniforms, turning up on time, or passing drug tests.

It is possible in theory to bypass the enthusiasts and have more professional people learn the techniques. One problem is that becoming skilled requires practice, and that generally means practice on innocent victims. More significantly, the first step in any action is to work through cut-out computers to avoid being traced, and those cut-outs are also hacked computers belonging to random victims. That’s the way casual hackers, spammers and other computer criminals work, and espionage hackers have to use the same techniques. They have to be doing it all the time, to keep a base of operations, and to keep their techniques up to date.

For all these reasons, it makes much more sense for state agencies to stay arms-length from the actual hackers. The agencies will know about the hackers, maybe fund them indirectly, cover for them, and make suggestions, but there won’t be any official chain of command.

So the hackers who got the data from the DNC were probably somewhat associated with the Russian Government (though a comprehensive multi-year deception by another organisation deliberately appearing to be Russian is not completely out of the question).

They may have had explicit (albeit off-the-record) instructions, but that’s not necessary. As the New York Times itself observed, Russia has generally been very alarmed by Hillary Clinton for years. The group would have known to oppose her candidacy without being told.

“It was conventional wisdom… that Mrs. Clinton considered her husband’s efforts to reform Russia in the 1990s an unfinished project, and that she would seek to finish it by encouraging grass-roots efforts that would culminate with regime change.”

Dealing with the product is another matter. It might well have gone to a Russian intelligence agency, either under an agreement with the hackers or ad-hoc from a “concerned citizen”: you would assume they would want to see anything and everything of this kind that they could get. While hacking is best treated as deniable criminal activity, it would be much more valuable to agencies to have close control over the timing and content of releases of data.

So I actually agree with the legacy media that the extraction and publication of Democratic emails was probably a Russian intelligence operation. There is a significant possibility it was not, but was done by some Russians independent of government, and a remote possibility it was someone completely unrelated who has a practice of deliberately leaving false clues implicating Russia.

I’ve often said that the real power of the media is not the events that they report but the context to the events that they imply. Governments spying on each other is completely normal. Governments spying on foreign political movements is completely normal. Governments attempting to influence foreign elections by leaking intelligence is completely normal. Points to Nydwracu for finding this by William Safire:

“The shrewd Khrushchev came away from his personal duel of words with Nixon persuaded that the advocate of capitalism was not just tough-minded but strong-willed; he later said that he did all he could to bring about Nixon’s defeat in his 1960 presidential campaign.”

The major restraint on interference in foreign elections is generally the danger that if the candidate you back loses then you’ve substantially damaged your own relations with the winner. The really newsworthy aspect of all this is that the Russians had such a negative view of Clinton that they thought this wouldn’t make things any worse. It’s been reported that the Duma broke into applause when the election result was announced.

The other thing that isn’t normal is a complete public dump of an organisation’s emails. That’s not normal because it’s a new possibility, one that people generally haven’t begun to get their heads around. I was immediately struck by the immense power of such an attack the first time I saw it, in early 2011. No organisation can survive it: this is an outstanding item that has to be solved. I wouldn’t rule out a new recommended practice to destroy all email after a number of weeks, forcing conversation histories to be boiled down to more sterile and formal documents that are far less potentially damaging if leaked.

It is just about possible for an organisation to be able to adequately secure their corporate data, but that’s both a technical problem and a management problem. However, the first impression you get is of the DNC is one of amateurism. That of course is not a surprise. As I’ve observed before, if you consider political parties to be an important part of the system of government, their lack of funding and resources is amazing, even if American politics is better-funded than British. That the DNC were told they had been hacked and didn’t do anything about it is still shocking. Since 2011, this is something that any organisation sensitive to image should be living in fear of.

This is basically evidence-free speculation, but it seems possible that the Democratic side is deficient in actual organisation builders: the kind of person who will set up systems, make rules, and get a team of people to work together. A combination of fixation on principles rather than practical action, and on diversity and “representativeness” over extraordinary competence meant that the campaign didn’t have the equivalent of a Jared Kushner to move in, set up an effective organisation and get it working.

Or possibly the problem is more one of history: the DNC is not a political campaign set up to achieve a task, but a permanent bureaucracy bogged down by inferior personnel and a history of institutional compromises.  Organisations become inefficient naturally.

Possibly Trump in contrast benefited from his estrangement from the Republican party establishment, since it meant he did not have legacy organisations to leak his secrets and undermine his campaign’s efficiency. He had a Manhattan Project, not an ITER.

The task of building–or rebuilding–an organisation is one that few people are suited to. Slotting into an existing structure is very much easier. Clinton’s supporters particularly are liable to have the attitude that a job is something you are given, rather than something you make. Kushner and Brad Parscale seem to stand out as people who have the capability of making a path rather than following one. As an aside, Obama seems to have had such people also, but Clinton may have lacked them. Peter Thiel described Kushner as “the Chief Operating Officer” of Trump’s campaign. Maybe the real estate business that Trump and Kushner are in, which consists more of separate from-scratch projects than most other businesses, orients them particularly to that style.


Archiving


A couple of casual online conversations:

First, journalist Jamie Bartlett banging on on Twitter about blockchain.

It became fashionable in 2015 to dismiss bitcoin but get excited about blockchain.  I never really got it, because what makes the blockchain work is the fact that there are rewards for building it.  I can download the blockchain and not even know who I am downloading it from, but, because (a) it takes enormous resources to create that data, and (b) that enormous effort is only rewarded if the recent blocks were added to the longest chain that other bitcoin users were seeing at time, I can be very confident that the whole chain, at least up to the last few blocks, is the same one anyone else is seeing, though I don’t know who I got mine from and I don’t know who they would get theirs from.

A blockchain without a cryptocurrency to reward the miners who create the blockchain is just a collection of documents chained by each containing the hash of its parent. In other words, it is just git.

What I hadn’t realised is that the people so excited about blockchains actually didn’t know about git, even though this aspect of bitcoin’s design was explicitly based on git, and even though git is about 100-1000X more widely used than bitcoin. They maybe knew that git was a source control system, and that you could store and share stuff on github.com, but they didn’t know that it is impossible to publish a version of a git project with a modified history that wouldn’t be obvious to anyone who tried to load it but who previously had the true version of that history.  If you publish something via git, anyone can get a copy from you or from each other, and anyone can add material, but if anyone tampers with history, it will immediately show.

So, when Bartlett said “Parliament should put its records on a blockchain”, what I deduced he really meant was “Parliament should check its records into git”. Which, if you happen to care for some reason about the wafflings of that bunch of traitors and retards, is a fairly sensible point.

So much for that. On to incidental conversation the second.

P D Sutherland has been in the news, speaking in his role as Special Representative of the Secretary-General of the United Nations. @Outsideness highlighted a tweet of his as “possibly the most idiotic remark I’ve ever seen”

The interesting thing is I distinctly remember a post on Sutherland, probably 2-3 years ago, on one of the then-young NRx blogs, and a bit of discussion on the comments. It’s interesting because Sutherland is such a stereotype Euro-politician ( Irish bar -> Fine Gael -> Trilateral Commission -> European Commissioner -> United Nations ), to be worth attention. Further, it would be interesting to see what we saw and to what extent we might have anticipated the present.

However, I couldn’t find the post or discussion. Blogs come and go, writers change personas, and either it’s gone or the search engines couldn’t find it.

Putting these two together, we need to archive our valuable materials, and the proper tool for a distributed archive is git. Spidering a blog might work for a dead one like Moldbug’s, but is a poor way of maintaining a reserve archive of numerous live ones.

I’ve written some ruby scripts to convert blog export files and feed files into one file per post or comment, so they can be archived permanently.  All a bit scrappy at the moment, but it seems to work.

The idea (when it’s a bit more developed) would be that a blog owner could offer the blog as a git archive alongside the actual web interface. Anyone could clone that, and keep it updated using the feed. If the blog ever vanishes, the git clones still exist and can be easily shared.

(I wouldn’t advise posting the git archive to a public site like github. The issue is not privacy–the data is all public in the first place–but deniability.  If you decide to delete your blog, then a recognised public archive is something people can point to to use the content against you, whereas a personal copy is less attributable. Of course, you can’t prevent it, but you can’t prevent archive.org or the like either)


Twister


Back in 2012, I looked at the concept of peer-to-peer blogging. It is definitely time to revisit
the environment.

Back then, the main threat I was concerned with was state action directed against service providers being used for copyright infringement. Since then, my political views have become more extreme, while the intolerance of the mainstream left has escalated alarmingly, and so the main threat today is censorship by service providers, based on their own politics or pressure from users and/or advertisers.

Actually publishing content has become easier, due to cheap virtualised hosting and fast residential broadband, making a few megabytes of data available is not likely to be a problem. The difficult bit is reaching an audience. The demise of Bloglines and then Google Reader has been either a cause or a symptom of the decline of RSS, and the main channels for reaching an audience today are facebook and twitter. I don’t actually use facebook, so for me twitter is the vital battleground. If you can build up a following linked to a twitter ID, you can move your content hosting around and followers will barely be aware it’s moved. Last week’s Chuck Johnson affair defines the situation we face. We require a robust alternative to twitter—not urgently but ideally within a 12–24 month timeframe.

I’ve been running the Twister peer-to-peer twitter clone for a couple of weeks, and I think it is OK.

Primarily, it is built on top of the bittorrent protocol. Messages are passed from node to node, and nodes collect messages that are relevant to them.

In addition, it uses the bitcoin blockchain protocol. This is not for content, but for the ID database. Content published by an ID must be signed by the key associated with that ID, and the association of keys with IDs is made via writing entries into the blockchain. Ownership of IDs is therefore “first come, first served”, with the ordering of claims determined by the blockchain (just as the order of transaction attempts is determined for bitcoin, preventing double spends).

As an incentive to build the blockchain, each block can include a “spam message” which will be presented to users.

What that means is that there is no authority who can disable a user ID or take it over. If the ID is registered on the twister blockchain with your public key, it is yours forever.

The application runs, like the bitcoin reference client it is based on, as a daemon offering a JSON-RPC socket interface. It also serves some static web pages over HTTP on the same port, providing a working twitter-lookalike web client.

As far as I can see, it works properly and reliably. I am running it over Tor, and that works fine.

Current Shortcomings

It’s still treated as experimental by the authors, so it’s not surprising if it’s not complete.

The biggest shortcoming is that it’s inconvenient to run. Like bittorrent, it needs to find peers and build a network to exchange data with, and, like bitcoin, it needs to keep up with a blockchain. (It is not necessary to “mine” or build the blockchain to use the service). You really need to start it up and leave it running, if not 24/7, at least for hours at a time.

For the same reason, it doesn’t run on mobile devices. It could be ported, but staying on the peer-to-peer networks would be an inconveniently heavy use of data, battery and processor resources.

Fundamentally, you don’t see all the traffic (that wouldn’t scale), so you can’t conveniently search it. You need to advertise that you are interested in something (by following a user, for instance), and gradually it will start to flow your way.

Future Shortcomings

The network is currently very small-scale, so it remains to be seen how well it would scale up to a useful size. I don’t understand the torrent / DHT side of things all that well, but as far as I can see it should hold up.

The ID blockchain functionality seems more reasonable. If each new user requires of the order of 64 bytes of blockchain space, then ten million users would need about a gigabyte of disk space to archive. A lot, but not prohibitive. As with bitcoin, the hope would be that users would be able to use lightweight clients, with the heavy network functions semi-centralised.

[The useful feature of a peer-to-peer protocol for us in this scenario is not that there is no trust in the system at all, or that there is no centralisation at all; it is that there is no single thing that must be trusted or relied on. The user has the option of doing everything themselves, and, more useful to the ordinary user, they have the option of temporarily and conditionally trusting a provider of their choice]

Also as with bitcoin, the most difficult obstacle is key management. When you want to start using twister, you generate a key pair, and post a transaction associating your public key with your chosen twister ID. You need the private key to post twists, or to see private messages. If you lose the key, you’ve lost your ID. If someone gets your key, they can post as you and read your private messages. Handling keys securely is difficult. For a casual user who isn’t too concerned about surveillance or censorship, it’s prohibitive.

Like bitcoin, the network node, blockchain archive and wallet (user ID) are all managed by a single process. Logically, the private operations of creating authenticated transactions/messages ought to be separate from the maintenance of the network node.

Twister is designed for those who are concerned about surveillance or censorship, but we need to be able to talk to those who aren’t. It needs to provide security for those who need it, while being as easy as possible for those who don’t.

The system seems fairly robust to attacks, including denial-of-service attacks. Media companies have attempted to interfere with bittorrent, but have not as far as I know blocked an actual running torrent, rather concentrating on the chokepoints of communicating knowledge of specific torrents.

The ID subsystem could be flooded with new id requests. There is a proof-of-work requirement on individual “transactions” (new id assignments), separate from the actual block proof-of-work, but that cannot be too onerous, so a determined adversary could probably produce tens of thousands. However, miners could respond by being fussier about what they accept, without breaking the protocol.

The blockchain itself is vulnerable. The hashrate at present is about one quarter-millionth of Litecoin’s (which uses the same hash method), so one block of the twister blockchain currently costs about the same in compute resources as a thirtieth of a cent worth of Litecoin. (I have mined dozens of blocks myself over the past week). Anyone with a serious GPU-based mining rig could mine hundreds of blocks in minutes. The incentive for legitimate miners is always going to be weak, since a customised client can trivially ignore the “spam” messages.  However, it does not seem obvious that that is a real problem. The value of the blockchain is that it established ownership of IDs, but an ID is not really valuable until it has been used for a considerable period, so to take over a valuable ID, you have to fork the blockchain from a long period in the past. Even if you have the hashpower to do that, your blocks are likely to be ignored simply by virtue of being so old.

Suggested Enhancements

The main author has suggested taking the cryptography out of the daemon and into the web client (in javascript). That would be an improvement and a step towards usable lightweight clients.

However, there is another requirement to do that, which is more sophisticated key management. Mobile devices and third-party service providers would hugely improve the convenience and usability of the service, but at a cost of crippling the security, since neither one is sufficiently trustworthy to hold the private key.

What I have suggested is a system of subkeys, with restricted delegated authority.  I create my key pair and post it to the network with my chosen ID, as per the current protocol. Then, I can create a new key pair, and create a transaction signed by my original key (which I call the “master” key), delegating the authority to make posts for a limited time (a week, say) to this new key (which I call a “subkey”). I transfer the private key of the subkey to my phone app, or to a service-provider I trust, and can then make posts using the subkey.

After the week, that subkey is expired and posts made with it will no longer be accepted as valid by other clients or network nodes. If the key is compromised, the damage is limited. I could even post a “revoke” transaction signed by my master key.

Alternatives

@jokeocracy has pointed at Trsst. Also, GnuSocial is quite well established. Both of these are federated client-server architectures. See quitter.se as an example GnuSocial-based service provider. (It would be funny if we were to all move en bloc onto some lefty-oriented “free from capitalism” platform, and perhaps instructive, but not necessarily a long-term solution).

There is some resistance to censorship there, in that if one service provider blocks you, you can switch to another. However, your persistent ID is tied to the service provider you choose, which could take a dislike to you or (equally likely in the early stages) just go away, so it makes it harder to maintain continuity. Also, the federation model does not necessarily prevent the consumer’s service provider from censoring your messages to its customers. The customers can switch if they want to, but not trivially.

In the case of Trsst, it strikes me that this is a mistake: users have private keys, but the association of keys to IDs, unlike in the case of twister, is made by the service provider. If mentions, replies, and subscriptions were by public key instead of by “nickname”, users could migrate more painlessly. However, that registry would have to be distributed, adding complexity.

In the long run, what I would hope to see is a service that looks like quitter.se or Trssst, but acting as a proxy onto the Twister network, ideally with short-lived subkeys as I describe above.

Other relevant projects not ready yet would are Urbit (of course), and chatless (by @_raptros).


Thinking about Urbit

28th September 2013

Uncategorised

Comments Off on Thinking about Urbit


OK, I’ve been driving myself nuts trying to work out how Urbit does I/O when it’s implemented using Nock and Nock doesn’t do I/O.

It’s now the middle of the night and I think I’ve got it.

Since it’s not in the Nock spec, and the Nock spec is defined in terms of nouns, it can only be hidden in the implementation of a noun.

A naive reading of the spec suggests there are two kinds of noun:

  1. a literal value (arbitrary-size integer)
  2. a pair of nouns

The only way it can work is if there are at least four kinds of noun

  1. a literal value
  2. a pair of nouns L and R
  3. the stream of input events
  4. a nock invocation on a pair of nouns A and F

Further, the “opcode 2” reduction in the Nock evaluator is not implemented by recursing the Nock evaluator, but by returning a type 4 noun.

A type 3 noun “counts” as a pair, where L is the next event in the input stream and R is another type 3 noun

The runtime creates a type 4 noun where A is a type 3 noun and F is the system-implemented-in-nock

It then calls a native function output(n) on the noun it created.

output(n) looks at the type of n. If it’s type 1, it treats it as an output event and “performs” it.

If it’s type 2, it calls output on L, then on R

If it’s type 4, it runs the Nock evaluator on it and calls output() on the result.

Can anyone who’s looked into the vere source tell if that is about right?


Social-network threat models


There have been a couple of comments on my peer-to-peer blogging post, both addressing different threat models than I was looking at.

My posts were looking at countermeasures to continue blogging in the event that public web hosting service providers are taken out by IP enforcement action. The aim of such enforcement action is to prevent distribution of copyrighted content: since I don’t actually want to do that I am not trying to evade the enforcement as such, just trying to avoid being collateral damage.  The major challenges are to avoid conventional abuse, and to maintain sufficient availability, capacity and reliability without the resources of a centralised service with a proper data centre.

Sconzey mentioned DIASPORA*.  That is an interesting project, but it is motivated by a different threat model – the threat from the service providers themselves.  Social-networking providers like facebook or google, have, from their position, privileged access to the data people share, and are explicitly founded on the possibilities of profiting from that access. Diaspora aims to free social-networking data from those service providers, whose leverage is based on their ownership of the sophisticated server software and lock-in and network effects.  To use Diaspora effectively, you need a good-quality host.  Blogging software is already widespread – if you have the infrastructure you need to run Diaspora, you can already run wordpress.  The “community pods” that exist for Diaspora could be used for copyright infringement and would be vulnerable to the SOPA-like attacks.

James A. Donald says “we are going to need a fully militarized protocol, since it is going to come under state sponsored attack.” That’s another threat model again. Fundamentally, it should be impossible for open publication: if you publish something, the attacker can receive it. Having received it, he can trace back one step where it came from, and demand to know where they got it from.  If refused, or if the intermediate node is deliberately engineered so messages cannot be traced back further, then the attacker can threaten to shut down or isolate the node provider.

In practice it can be possible to evade that kind of attacker by piggy-backing on something the attacker cannot shut down, because he relies on it himself.  That is a moving target, because what is essential changes over time.

(One could avoid using fixed identifiable locations altogether – e.g. wimax repeaters in vehicles. That’s not going to be cheap or easy).

James seems to be thinking more about private circles, where end-to-end encryption can be used. That’s more tractable technically, but it’s not useful to me. I don’t have a circle of trusted friends to talk about this stuff with: I’m throwing ideas into the ether to see what happens. Any of you guys could be government agents for all I know, so carefully encrypting my communications with you doesn’t achieve anything.


More on peer-to-peer blogging


I was musing a few days ago on how to do blogging if SOPA-like measures take out hosting providers for user content.

Aaron Davies in a comment suggests freenet. I’m not sure about that; because you don’t choose at all what other content you’re hosting, I would expect the whole system to drown in movie rips and porn. The bittorrent idea where the stuff which you help distribute is the stuff which you want to consume seems less vulnerable. alt.binaries didn’t die because of copyright enforcement, it died because the copyright infringement made such large demands on capacity that it was not worth distributing.

Bear in mind that I’m not going “full paranoid” here: my threat scenario is not “the feds want to ban my blog”, it’s “Blogger and the like have so much difficulty complying with IP law that they’re becoming painful and/or expensive to use”.

In that circumstance, simply running wordpress or geeklog on my own machine is an option, but rather a crappy one in capacity and reliability terms. I’ve already looked into using a general web hosting provider, and I could move onto that for probably five quid a month, but I’ve again been put off by reliability issues. Also, in the threat scenario under consideration, third-party web hosting might be affected also.

But Davies in passing mentioned email. When I saw that I went “D’oh”. I hadn’t thought of using SMTP. I’d thought of NNTP, which I have a soft spot for¹, but rejected it. SMTP could well be the answer — like NNTP, it was designed for intermittent connections. Running mailman or something on your home PC is a lot simpler and safer than running wordpress. The beauty of it is that not even Hollywood can get email banned. And if they tried, all you need to keep dodging is a non-government-controlled DNS, which is something people are already working on.

You still need a published archive though; one that people can link to. But that can work over SMTP too, as a request-response daemon. Those were actually quite common before the web: you could get all sorts of information by sending the right one-line email to the right address.

There were actually applications that ran over SMTP. One which lasted well into web days, and may even still exist here and there, was the diplomacy judge, for playing the board game Diplomacy over email.

Unmoderated comments would have to go under this scenario, whatever the technology, but moderated comments would be easy enough; the moderator would just forward acceptable comments onto the publication queue. Email clients in the days when mailing lists were very common were designed specifically to make following lists in this way easy (I remember mutt was much favoured for the purpose). Each list became a folder (by using procmail or the like), each post a thread, and each comment a reply. My own email is still set up that way, though I pretty much never look at the list folders any more, I think a couple of them are still being populated for things like development of the linux b43 wireless chipset driver.

The problem with using mail is spam. Everyone who wants to subscribe has to give me their email address — that’s probably the biggest reason why the use of mailing lists declined; that and the impact of false positives from spam filtering.

 If generic publishing networks drown in media, and mail drowns in spam, then some more private network is needed.

Requirements:

  •  Anyone can access posts, as easily as possible
  •  I only have to process posts from sources I’ve chosen

Our big advantage is that the actual storage and bandwidth needed for blogging are rounding error in a world of digital video.

Reliable access requires that there are multiple sources for posts, to compensate for the fact we’re not running industrial data centres.

The obvious approach is that if I follow a blog, I mirror it. Someone wanting to read one of my posts can get it from my server, or from any of my regular readers’ servers. That just leaves the normal P2P problems

  • locating mirrors, in spite of dynamic IP assignment
  • traversing NAT gateways which don’t allow incoming connections.
  • authenticating content (which might have been spoofed by mirror)

Authentication is trivial — there’s no complex web of trust: each blog has an id, and that id is the digital signature. The first two are difficult, but have been solved by all the P2P networks. Unlike some of them, we do want to persistently identify sources of data, so presumably each node regularly notifies the other nodes it knows of of its location. Possibly other already-existing p2p networks could be used for this advertisement function. There’s a DoS vulnerability there with attackers spoofing location notifications, so probably the notifications have to be signed. I guess the node id is distinct from the blog id (blogs could move, nodes could originate more than one blog) so it’s also a distinct key. Like a blog id, a node id essentially is the public key. NAT traversal I’m not sure about — there’s stuff like STUN and ICE which I haven’t really dealt with.

Assuming we can map a persistent node id to an actual service interface of some kind, this is what it would have to provide:

  • List blogs that this is the authoritative source for
  • List blogs that this mirrors (also returning authoritative source)
  • List other known mirrors for a blog id
  • List posts by blog id (optional date ranges etc)
  • Retrieve posts by blog id and post id
  • Retrieve moderated comments by blog id and post id (optional comment serial range)
  • Retrieve posts and moderated comments modified since (seq num)

The service is not authenticated, but posts and moderated blog comments are signed with the blog key. (Comments optionally signed by the commenter’s key too, but a comment author signature is distinguishable from a comment moderator signature).

The service owner can also

  • Create post
  • Add post to a blog
  • Edit post
  • Add a moderated comment to a blog
  • Check mirrored blogs for new posts & comments & mirror list updates

There’s a case for mirroring linked posts on non-followed blogs: if I link to a post, I include it on my server so that whoever reads it can read the link too.  Ideally, there should be an http side to the service as well, so people outside the network can link to posts and see them if they have the good luck to catch the right server being available at the time.  That all needs more thought.

¹When RSS was coming in, I argued that it was just reinventing NNTP and we ought to use that instead.


SOPA


I never blogged on the SOPA kerfuffle; it happened while my creative(?) energies were elsewhere.

Looking back, a few minor points emerge:

Some commentators got all excited: “look what we did! What shall we do next?!” “We” meaning right-thinking internet-type people. The answer, obviously, is nothing: this, “we” agreed about, most things, we don’t. I think Wikipedia’s claim: “Although Wikipedia’s articles are neutral, it’s existence is not” was basically justified.

Libertarian commentators had a lot of fun jeering at leftist techies who wanted every aspect of the economy to be regulated by the government except the internet. The criticism is only justified against those who demand that government regulate things but don’t specify exactly how they should regulate them (others can say they’re in favour of regulation, but just want it to be better). But that’s most people. So yeah.

In some ways, it’s a disappointment that SOPA didn’t go through; the circumvention techniques that would have been developed if it had would have been interesting and useful. At the end of the day, the biggest threat to free computing isn’t legislation, it’s that in a stable market, locked-down “appliance” devices are more useful to the non-tinkering user than general-purpose, hackable devices. So far, we tinkerers still have the GP devices, because the locked-down ones go obsolete too quickly even for lay users. I’m not sure whether that situation will persist for the long term: I’ve looked at the question before.

But if the government makes stupid laws that can easily be circumvented using general-purpose devices, the demand for those devices will be helpfully supported.

Note when I talk about circumvention, I’m not talking about copyright infringement. That was not what the argument was about. While I lean toward the view that copyright is necessarily harmful, I’m not certain and it’s not that big a deal. The important argument is all about enforcement costs: given that copyright exists, whose responsibility is it to enforce it. The problem with SOPA was that it would have put crippling copyright enforcement costs on any facilitator of internet communication.

Currently, internet discussion is structured mostly around large service providers — in the case of this blog Google — providing platforms for user content. If those service providers become legally liable for infringing user content, the current structure collapses. The platforms would either have to go offshore, with users relying on the many easy ways of circumventing the SOPA provisions attempting to limit access to offshore infringers, or else evade the enforcers by going distributed, redundant and mobile. What will be to Blogger as Kazaa and then BitTorrent were to Napster?  It would have been interesting to find out, and possibly beneficial. There is a lot of marginal censorship that can be applied to easy-target platforms like Blogger or Wikipedia that will not induce sufficient users to create alternatives, as the sheer idiot clumsiness of SOPA would probably have done.

(Note Wikipedia might have been spared, but it would have suffered, because if existing less respectable platforms were removed, their content would migrate to the likes of Wikipedia. If 4chan did not exist, Wikipedia would become 4chan.)

Actually, it’s interesting to think about how to blog over a pure P2P framework. Without comments, you’re publishing a linear collection of documents. (I don’t think you can handle comments — we’d need something more like trackbacks). Posts would need to be cryptographically signed and have unique ids. Serial numbers would be useful so readers would know if they’d missed anything. I wonder if anyone’s worked on it. A sort of bittorrent-meets-git hybrid would be really interesting — search this list of hosts for any git commits signed by any of these keys…

The dance of censorship and evasion is very difficult to predict in detail. I found some time ago that the way to find the text of an in-copyright book is to take a short phrase from it (that isn’t a well known quotation or the title) and google it. That used to work. I wanted some text from Evelyn Waugh’s Decline and Fall the other day, so I did the usual, and got pages and pages of forum posts, containing chunks of the book interspersed with links to pages selling MMO currency and fake LVMH crap. My access to illicit literature was being messed up by someone else’s illicit SEO.


AI, Human Capital, Betterness

7th January 2012

Uncategorised

Comments Off on AI, Human Capital, Betterness


Let me just restate the thought experiment I embarked on this week. I am hypothesising that:

  • “Human-like” artificial intelligence is bounded in capability 
  • The bound is close to the level of current human intelligence  
  • Feedback is necessary to achieving anything useful with human-like intelligence 
  • Allowing human-like intelligence to act on a system always carries risk to that system

Now remember, when I set out I did admit that AI wasn’t a subject I was up to date on or paid much attention to.

On the other hand, I did mention Robin Hanson in my last post. The thing is, I don’t actually read Hanson regularly: I am aware of his attention to systematic errors in human thinking; I quite often read discussions that refer to his articles on the subject, and sometimes follow links and read them. But I was quite unaware of the amount he has written over the last three years on the subject of AI, specifically “whole brain emulations” or Ems.

More importantly, I did actually read, but had forgotten, “The Betterness Explosion“, a piece of Hanson’s, which is very much in line with with my thinking here, as it emphasises that we don’t really know what it means to suggest we should achieve super-human intelligence. I now recall agreeing with this at the time, and although I had forgotten it I suspect it at the very least encouraged my gut-level scepticism towards superhuman AI and the singularity.

In the main, Hanson’s writing on Ems seems to avoid the questions of motivation and integration that I emphasised in part 2. Because the Em’s are actual duplicates of human minds, there is no assumption that they will be tools under our control; from the beginning they will be people with which we will need to negotiate — there is discussion of the viability and morality of their market wages being pushed down to subsistence level.

There is an interesting piece “Ems Freshly Trained” which looks at the duplication question, which might well be a way round the integration issue (as I wrote in part 1, “it might be as hard to produce and identify an artificial genius as a natural one, but then perhaps we could duplicate it”, and the same might go for an AI which is well-integrated into a particular role).

There is also discussion of cities which consist mainly of computer hardware hosting brains. I have my doubts about that: because of the “feedback” assumption at the top, I don’t think any purpose can be served by intelligences that are entirely isolated from the physical world. Not that they have to be directly acting on the physical world — I do precious little of that myself — but they have to be part of a real-world system and receive feedback from that system. That doesn’t rule out billion-mind data centre cities, but the obstacles to integrating that many minds into a system are severe. As per part 2, I do not think the rate of growth of our systems is limited by the availability of intelligences to integrate into them, since there are so many going spare.

Apart from the Hanson posts, I should also have referred to an post I had read by Half Sigma, on Human Capital. I think that post, and the older one linked from it, make the point well that the most valuable (and most renumerated) humans are those who have been succesfully (and expensively) integrated into important systems.


Relevance of AI


I felt a bit bad writing the last post on artificial intelligence: it’s outside my usual area of writing, and as I’d just admitted, there are a number of other points within my area that I haven’t got round to  properly putting in order.

However, the questions raised in the AI post aren’t as far from the debates Anomaly UK routinely deals in as I first thought.

Like the previous post, this falls firmly in the category of “speculations”.  I’m concerned with telling a consistent story; I’m not even arguing at this stage that what I’m describing is true of the real world today.  I’ll worry about that when the story is complete.

Most obviously, the emphasis on error relates directly to the Robin Hanson area of biases and wrongness is human thinking. It’s not surprising that Aretae jumped straight on it. If my hypothesis is correct, it would mean that Aretae’s category of “monkeybrains”, while of central importance, is very badly named: the problems with our brains is not their ape ancestry, but their very purpose: attempting to reach practical conclusions from vastly inadequate data. That is what we do; it is what intelligence is, and the high error rate is not an implementation bug but an essential aspect of the problem.

(I suppose there are real “monkeybrains” issues in that we retain too high an error rate even when there actually is adequate data. But that’s not the normal situation)

The AI discussion relates to another of Aretae’s primary issues: motivation. Motivation is getting an intelligence to do what it ought to be doing, rather than something pointless or counterproductive. When working with human intelligence, it’s the difficult bit. If artificial intelligence is subject to the problems I have suggested, then properly specifying the goals that the AI is to seek will quite likely also turn out to be the difficult bit.

I’m reminded in a vague way of Daniel Dennett’s writings on meaning and intentionality. Dennett’s argument, if I remember it accurately, is that all “meaning” in human intelligence ultimately derives from the externally-imposed “purpose” of evolutionary survival. Evolutionary successful designs behave as if seeking the goal of producing surviving descendants, and seeking this goal implies seeking sub-goals of feeding, defence, reproduction, etc. etc. etc. In humans, this produces an organ that explicitly/symbolically expresses and manipulates subgoals, but that organ’s ultimate goal is implicit in its construction, and not subject to symbolic manipulation.

The hard problem of motivating a human to do something, then, is the problem of getting their brain to treat that something as a subgoal of its non-explicit ultimate goal.

I wonder (in a very handwavy way) whether building an artificial intelligence might involve the same sort of problem of specifying what the ultimate goal actually is, and making the things we want it to do register properly as subgoals.

The next issue is what an increased supply of intelligence would do to the economy.  Though an apostate libertarian, I have continued to hold to the Julian Simon line that “Human inventiveness is the Ultimate Resource”. To doubt that AI will have a revolutionarily beneficial effect is to reject Simon’s claim.

Within this hypothesis, the availability of humanlike (but not superhuman) AI is of only marginal benefit, so Simon is wrong. Then, what is the ultimate resource?

Simon is still closer than his opponents; the ultimate resource (that is the minimum resource as per the law of the minimum) is not raw materials or land. If it is not intelligence per se, it is more the capacity to endure that intelligence within the wider system.

I write conventional business software.  What is it I spend my time actually doing? The hard bit certainly isn’t getting the computer to do what I want. With modern programming languages and and tools, that’s really easy — once I know what it is I want.  There used to be people with the job title “programmer” whose job it was to do that, with separate “analysts” who told them what the computer needed to do, but the programmer was pretty much an obsolete role when I joined the workforce twenty years ago.

Conventional wisdom is that the hard bit is now working out what the computer needs to do — working with users and defining precisely how the computer fits into the wider business process. That certainly is a significant part of my job. But it’s not the hardest or most time-consuming bit.

The biggest part of the job is dealing with errors: testing software before release to try to find them; monitoring it after release to identify them, and repairing the damage they cause. The testing is really hard because the difficult bits of the software interact with multiple outside people and systems, and it’s not possible to fully simulate them. New software can be tested against pale imitations of the real world, and if it’s particularly risky, real users can be reluctantly drafted in to “user acceptance” testing of the software. But all that — simulating the world to test software, having users effectively simulate themselves to test software, and running not-entirely-tested software in the real world with a finger hovering over the kill button — is what takes most of the work.

This factor is brought out more by the improvements I mentioned in the actual writing of software, but it is by no means new. Fred Brooks wrote in The Mythical Man-Month that if writing a program took n days, integrating it into a system would take 3n days, properly productionising it (so that it would run reliably unsupervised) would take 3n days, and these are cumulative, so that a productionised, integrated version of the program would take something like ten times as long as a stand-alone developer-run version to produce.

Adding more intelligences, natural or artificial, to the system is the same sort of problem. Yes, they can add value. But they can do damage also. Testing of them cannot really be done outside the system, it has to be done by the system itself.

If completely independent systems exist, different ideas can be tried out in them.  But we don’t want those: we want the benefits of the extra intelligence in our system.  A separate “test environment” that doesn’t actually include us is not a very good copy of the “production environment” that does include us.

All this relates to another long-standing issue in our corner of the blogosphere: education, signalling and credentialism. The argument is that the main purpose of higher education is not to improve the abilities of the students, but merely to indicate those students who can first get into and then endure the education system itself. The implication is that there is something very wrong with this. But one way of looking at it is that the major cost is not either producing or preparing intelligent people, but testing and safely integrating them into the system. The signalling in the education system is part of that integration cost.

Back on the Julian Simon question, what that means is that neither population nor raw materials are limiting the growth and advance of civilisation. Rather, civilisation is growing and advancing roughly as fast as it can integrate new members and new ideas. There is no ultimate resource.

It is not an original observation that the things that most hurt our civilisation are self-inflicted. The organisation of mass labour that produced industrialisation also produced the 20th century world wars. The flexible allocation of capital that drove the rapid development of the last quarter century gave us the spectacular misallocations with the results we’re now suffering.

The normal attitude is that these accidents are avoidable; that we can find ways to stop messing up so badly. We can’t.  As the external restrictions on our advance recede, we approach the limit where the benefits of increases in the rate of advance are wiped out by more and more damaging mistakes.

Twentieth Century science-fiction writers recognised at least the catastrophic risk aspect of this situation. The concept that the paucity of intelligence in the universe is because it tends to destroy itself is suggested frequently.

SF authors and others emphasised the importance of space travel as a way of diversifying the risk to the species. But even that doesn’t initially provide more than one system into which advances can be integrated; at best it reduces the probability that a catastrophe becomes an extinction event. Even if we did achieve diversity, that wouldn’t help our system to advance faster, unless it encouraged more recklessness — we could take a riskier path, knowing that if we were destroyed other systems could carry on. I’m not sure I want that; it raises the same sort of philosophical questions as duplicating individuals for “backup” purposes. In any case, I don’t think even that recklessness would help: my point is not just that faster development creates catastrophic risk, but that it increases the frequency of more moderate disasters, like the current financial crisis, and so wipes out its own benefits.






Categories