It’s a little bit drafty, so any feedback or comments welcomed. I will republish as a HTML file soon.
Investigating privacy in depth has been one of the most interesting things I’ve done in a long time… though it has left me a little bit creeped out. Hopefully I’ll get around to writing a bit more about what I read, what I’ve learned, and what practical changes I’ve made.
The following part of my forthcoming position paper for the W3C Workshop on Privacy and advanced APIs. Because my paper focused on implementation of geo-location, this sections had to be cut out. However, I think the following is relevant to the discussion about privacy and packaged Web applications, which is why I am publishing it here.
When it comes to privacy, it is obviously insufficient to simply define an API in terms of an Interface Definition Language (IDL), such as WebIDL and OMGIDL, within specifications. IDLs are limited in that they only allow one to express simple inputs, outputs, and data type constraints. Nevertheless, implementations exist based on specifications that only provide IDL definitions, which are agnostic to privacy. To overcome these limitations, some implementers leverage digital signatures as the means of enabling privacy-sensitive APIs in an application. For example, if application “X” is signed by company “Y” then allow application “X” to access API “Z”.
Such an approach to privacy is limited in that it hands control of privacy matters over to a third-party (the signer) and implicitly assumes that the end-users unquestionably, or via a End User License Agreement (EULA), trusts the signer as the authority to enable an API without necessarily informing an end-user as to what is going on “under the hood” – such a model is commonly seen in the Java application space.
Others have extended the digital signature to enable API model by having software developers explicitly declare what functionality an application will use (lets call them “feature requests“). Upon installation, the end-user is presented with a dialog informing them of the capabilities the application will use, and if they wish to proceed. An example is Chrome’s browser extensions, seen on the right.
From a privacy perspective, this model is significantly better then simply enabling APIs based on digital signatures. However, this model is also problematic in that it often does not provide any meaningful information about, for instance, what “can access your browsing history” coupled with “access your data on all websites” means. It can be argued that this model unfairly puts the consequences of consent on the end-user, by entering them into an agreement with an application without recourse (i.e., “Yes website/application X, you can access my history data even though I don’t know what you will do with it.”).
Yesterday I started writing a paper for WWW2008 about widgets (and given the highly competitive nature of the WWW conferences, I doubt it will be accepted). Anyway, the conference mandates that citations conform to ACM’s referencing style (eg. smith  says, “bla bla”), which is not currently supported by Microsoft Word. My immediate thought was, “Right! Word’s style files are just (OO)XML so it should just be a simple matter of changing some angled brackets to create the ACM style!”. My plan was to base the ACM style on the already supported ISO 690 style, which is similar except it uses parenthesis “(1)” instead of brackets “”. So I went into MS Word’s program file directory, and located the bibliographical styles. To my shock, the reference style file was an impenetrable XSLT file (7093 lines long and completely uncommented!). I spent about 20 minutes trying to work out what the hell the file was doing… but eventually I gave up :(. I compared ISO 690 XSLT style file to the ACM Bibtex sytle file. The bibtex style file is only around 1700 lines long, and nicely commented I might add.