Signing XML

I’d just begun to muse about signing Atom/RSS articles, when Johannes Ernst began blogging about the topic. I had assumed there must be some easy standard way to do it; but the answer turns out to be that there is a standard, but (according to Johannes) it’s far from easy, so much so that it’s nearly unuseable.

 (The problem in a nutshell: Digital signatures operate on raw data, so to sign something you have to be able to convert it to a sequence of bytes to stream through the signature algorithm. Crucially, to verify the signature you have to be able to convert the something you received into the exact same sequence of bytes. That’s no problem for JPEGs or HTTP bodies. But XML describes an abstract tree of nodes and attributes, with many possible text representations for the same data. If you parse some XML and then turn those data structures back into XML, the text will probably not be exactly the same. Specifying a canonical way to textualize an XML document turns out to be really hard since it has to take into account namespaces, entities, whitespace, character encodings and more. Yeesh!)

 The more I think about this the more worried I get. We are increasingly using XML-based formats for communication—Atom, RSS, Jabber. These formats contain multiple messages in the same document. Distributing these messages may involve copying them from one document into another: for example, when news feeds are aggregated or articles forwarded, and whenever a Jabber message is routed through a server. If we care about the integrity and identifiability of these messages—and the lesson from the current death-throes of email is that we damn well should—we need to sign them, and the original signatures of course need to travel with the messages. But when a XML message/article/entry element is copied from one place to another, its physical manifestation as a byte-sequence will typically change … leading to the XML-signatures quandary.

Johannes’s suggested XML-RSig (“Really Simple Signature”) solution is to avoid transforming the message’s byte sequence. The bytes to sign are the encoded characters from the opening “< " to the closing ">“, and the new sub-element containing the signature is spliced in by character insertion. Anyone copying the message to another XML document has to use an X-Acto knife to cut out the exact message text and insert it into the destination, rather than allowing it to be transformed in any way by an XML processor. (In fact, the destination document even needs to have the same character encoding.)

 I have mixed emotions about this. On the one hand, it certainly is clear and simple. (See Johannes’s list of benefits at the bottom of the post.) What bothers me:


  1. It requires a recipient to keep the original source byte-sequence of the message, if it might ever need to forward/aggregate the message, or even re-verify the signature later. That means altering its storage schema for messages to add a potentially-large blob.

  2. Conversely, it has to generate the new document by splicing in the original message contents. If it normally uses an XML-generation API, that might be awkward to do.

  3. Adding any XML sub-elements or attributes to the original message breaks the signature. A specific example is the “atom:source” element that is added to an Atom entry when it’s copied to another feed, to preserve the identity and metadata of the feed it came from.

I don’t have any good suggestions at this point. I’m writing this as a brain-dump. I’m posting it here because (a) Johanness’s blog doesn’t allow comments, and (b) it seems to be the Blog Way to reply to other people’s posts on your own front page, even if it’ll baffle the rest of your readers…

Previously: The Ballad Of badtz-maru
Next Post: Multisensory CPU meter