W3C – Marcos Cáceres' blog

Why manifests and permissions don’t mix

I often get requests to add fine-grained API permissioning to the Web Manifest format. Fine grained permissioning would be something like: “I want to access contacts, but I only need read only access”. The opposite being course grain permissioning: “I want to access contacts… and do all the things! (i.e., read, write, update, and delete)”.

There are quite a few problems with trying to rely on a manifest to grant permissioning – and we can easily see what they are by looking at platforms that rely on manifests for permissioning (e.g., Android, iOS, and Firefox OS). As iOS’s permissioning closely mirrors the Web’s API permissions model, I’m not going to discuss it (irrespective of apps being vetted through an App Store which weeds out most bad actors). The problems with Android’s up front permissioning model are well known (i.e., few people read/understand them), so we don’t even need to go there. It’s bad. End of story.

That leaves Firefox OS (FxOS). In this post, I’ll critically examine the issues that come up with FxOS’s use of a manifest to enable permissions and the somewhat sad implications of deciding to do permission in a manifest. FxOS is also a good platform to focus on because it is the platform that, despite being proprietary, most closely resembles the Web Platform: hence, we can learn a lot about what the implications would be if we brought the same permissioning model to the Web.

As a full disclaimer, I work for Mozilla who make FxOS (but I don’t work directly on FxOS!). I’m also one of the key contributors to the Web Manifest spec and I’m implementing the Web Manifest spec in Gecko.

FxOS’s permissioning

FxOS supports both course-grained and fine-grained permissioning in the manifest. Beyond the API capabilities afforded by the Web Platform, web apps targeted at FxOS can only enable permissions once they are explicitly “installed” by the user. FxOS exposes a proprietary API to enable installing a web site.

The implications is that web apps are restricted in functionality until users complete an installation process. However, because users are not able to experience the full application until they install an application, developers may be forced to nag users to install their app (e.g., by showing big “Install me!” buttons). This takes users out of context and forces them to do something that they might otherwise not want to do (install an app) to complete some task.

To overcome the “install me!” button problem, FxOS relies on having a centralized app store (Firefox marketplace), which is quite similar to Apple’s App Store or Google’s Play Store in how it operates. More on this below.

Security/Privacy sensitive Device API access

FxOS restricts the APIs a Web App can access to those exposed by the Web Platform + tiny handful of other proprietary APIs. Web apps targeting these FxOS-proprietary capabilities are known as “hosted apps”.

To allow access to more powerful APIs FxOS relies on a proprietary packaged application format, which bundles a digital signature, and the resources on an application (css, html, js) into a Zip file. And, of course, then relies on the Firefox marketplace to do the distribution. Packaged apps created by the average developer are called “privileged apps”. It is “privileged” in that Mozilla checks it, and, if it meets a set of criteria, it gets a stamp of approval in the form of a digital signature (a light weight form of DRM).

Packaged application rely on a manifest being available within a (zip) package – and because of its immediate availability, FxOS can use it as a means to enable fine-grained permissions. However, this comes with fairly severe scalability issues – and brings all the issues of traditional software distribution with it:

Developers must submit their application to a market place – leading to centralization.
Submitting an application to an App Store triggers a lengthy review process – and access to device/storage APIs are checked by a human reviewer. However, what we are essentially doing here is making the decision to allow apps to use certain APIs – and given users and assurance that this is ok (i.e., at runtime, we don’t prompt for security permissions – only for privacy ones like geo and accessing camera/microphone, etc.).
PKI infrastructure must be established, because APIs are only enabled if Mozilla’s signature is in the package.
Packaged apps don’t work well with the way the web is architectured – and this has lead Mozilla to introduce fairly insecure APIs to overcome these limitations. For example, FxOS allows developers to request a “SystemXHR” API, which essentially allows packaged apps to contact any server, ignoring CORS in the process.

How the Web Manifest differs

FxOS’s approach is in stark contrast with the way we (the community working on standardizing the manifest) are approaching the development and standardization of the Web Manifest format. The manifest is designed to be a non-critical and low priority resource: the manifest should only be downloaded “when needed” – and it should be treat it like a “progressive enhancement”. That is to say, web apps should not depend on it being either present or supported… it’s just “the cherry on-top” of a web apps ice cream.

Effectively, this means when the user attempts to “add to home screen” will the user agent initiate the d/l for the manifest. This naturally limits the utility of the manifest as a place to put permission policies for APIs – as the APIs could only become available after the manifest loads … leading to some bad race conditions.

Currently, the Web Manifest spec explicitly says not to delay the load event of the document. So, this means that there are limitations in detecting when the manifest actually loads (this is by design).

Additionally, in Gecko, we intend to support two ways of actually loading a manifest:

link rel=”manifest”: non-blocking, but could fire an onload event to signal readiness.
HTTP’s Link: header. As it’s outside the Document, it doesn’t interface with it (though it could, but it would be a bit of a layering violation).

The implication is that, as a dev, one would need to wait for the manifest to be available and ready before granting API permissions. This is not ideal, from my perspective, because it introduces significant delay and dependency on a resource that may never become available. A security implication of 2 is that a proxy can insert a header (though a proxy could also change the payload, so also affecting 1).

Hopefully this explains is why there has been reluctance (from me, at least) to add security critical permissions to the manifest, and instead continue to rely on the Web APIs themselves to handle permissioning.

So what’s the alternative

A different approach would be to simply limit the usage of an API when it is initialized – or limit access to an API from same origin, or the top-level browsing context, and maybe even requiring TLS. This is similar to what Service Workers and requestAutoComplete() are doing – they both require TLS. Doing so overcomes many of the issues I raised above: If security/context conditions are met, APIs can be used immediately without having to wait on some other external resources that dictates the policy of usage. This also removes fragility, in that policy is tightly bound to the API at point of usage.

It also removes the need for apps stores – but again forces API and UX designers to deal with the permission problems (instead of centralizing everything and having to have other humans do a review of the app before users can use it).

Bottom line: let’s use the Web’s security model + TLS to do API permissioning.

Web 2024 – A response to Robin Berjon’s post

I read Robin Berjon’s “Web 2024” and was kinda surprised how much our view of the future of the Web differs – though I agree with many things, specially with books turning into “apps” and TV-industry just doing it wrong. I think Robin was probably trying to drum up support for an exciting and somewhat positive vision, while sending a warning to others that if they don’t start to “get it”, they will go the way of the ~~dodo~~ Nokia.

This is my take on where we could be in 2024 and response to Robin’s write up. My vision is not pretty and isn’t what I want to happen, but what I feel will likely happen unless there is a radical shift in the way we build and standardize the platform.

Be warned, I’m a “the glass is half empty!” kinda guy.

Before presenting my history, some key things I fundamentally disagree with Robin about:

The rise of single page apps just ain’t gonna happen. Single page apps are unicorns. I proved that statistically already and I don’t see it ever becoming main stream. If we fix page transitions (so to avoid the flash of unstyled content as you navigate from one page to another), then single page apps are unnecessary. Yeah, it’s that simple, Robin!:)
JSON will die way before 2024. It’s a shitty standard and the lack of support for comments and trailing spaces makes it double shitty – it’s even worst than XML in that respect. It’s tremendously difficult to maintain and write. Something better will undoubtedly replace it way before 2024 (or browsers will start being more liberal about how they treat common errors leading to a new standard).

A history of the Web from 2014 to 2024

In the run up to 2024, a few attempts where made at making a browser in JavaScript but they all failed early on (around 2016 and then again in 2020). Engines like Gecko and Blink tried this (from before 2014) and were not able to implement as many features of the browser as they wanted in JS, because JS can’t access the things C++ code can, and it was not possible to implement APIs using JS for Workers (not making this up! this is a limitation of Gecko today that is not going away). JavaScript, despite its many advances, was still too far behind the curve of other languages to be competitive – it just lacked too many features, speed, and niceties when compared to the likes of Swift, Rust, Go, and even newer modern languages that emerged in the 2018-2022 period. Coupled with Apple’s marketing machine, and Swifts ease of use over Java, Objective-c, and JavaScript, many developers quickly became iOS converts leaving the Web out of frustration.

JavaScript

TC-39 felt the threat and tried to adapt (this time for real, having laughed off Dart into total irrelevance in 2013 despite Google’s fake/marketing-driven “standardization” of it in 2014 through ECMA); but the pace at which TC-39 standardized new features, and those features became available to the dependent native platforms was too slow. Unfortunately, in light of advances made in Swift and Rust and even crappy C++, JS just couldn’t hold its own in the app space. There was just too much legacy and browser “magic” baggage there: the inability to, for instance, not be able to use Object.observe() on host objects both confused and annoyed developers. And nobody got the whole “proxies” thing. Even the darling Node.js faded in favour of Go and other new emerging technologies.

JavaScript, of course, didn’t die or anything: it remained the lingua franca of the Web that it was in 2014 – but it was only in 2022 that it gained interesting features like enums, protocols, or generics. Interestingly, JS classes did become available in mid 2017 across all browsers, but lacking generics and protocols using classes didn’t really take off. Getting a module system did help tho, and it became quite widely used by 2020.

Web Components

JavaScript aside, things were looking up for the Web. After 2014’s Google IO the Web Components revolution finally began – and Service Workers were coming down the pipe and became usable in apps by 2017. Chrome’s of Service Workers implementation landed in late 2015, closely followed by Mozilla’s in 2016. Microsoft came soon after, but Apple held off till 2018 so no one could realistically use SWs in their apps till Apple finally supported them… and yeah, the iPhone 9 is pretty awesome, but I’m not allowed to talk about it.

Having Web Components was great, because it meant that HTML as a language was more or less done and developers were finally free to focus on creating their own elements that best represented the needs of their applications… except when they hit problems: mainly, this was to do with the preload scanner and other predictive magic the browser was trying to perform. Web components simply couldn’t explain the platform (or HTML elements) in the way its designers had hoped. The RICG had hit this problem early in 2012-2013 with and warned the Web Components people about it. But there was nothing that could be really done without exposing more of the guts of the browser (which required a lot of reengineering that browser vendors were not willing to do). So, web components were fairly successful but with some limitations. Thankfully, Client Hints started getting added to browsers in 2016 and it helped with many of the blockers around web components. Again, Apple held off supporting Web Components till 2017 so realistically they could not be used in production (not without needing to d/l a ton of polyfill code that just kept on growing).

Demise of the W3C

An interesting side effect of “finishing” HTML in 2014 was the W3C’s slow demise into irrelevance. The writing had been on the wall for a long time, as the W3C continue to pursue an increase in member participation instead of providing technical leadership. It also couldn’t really compete with the WHATWG and other community driven projects to add new features to browsers. The W3C’s inability to adapt its process to cater for living standards left more and more participants dissolution and further pushed browser makers to do their standardization work at the WHATWG and new emerging community driven efforts. The W3C shifted focus and became a place to “standardize” formats and other mostly irrelevant XMLy and other research/academic projects (so sad right now!). The last hold-outs were the CSS Working Group, but it too eventually broke up as means of adding CSS features became available to developers (i.e., a form of Web Components, but for CSS). By 2020, the browser vendors had all but abandoned the W3C – those that remained, only stayed there for marketing reasons but didn’t contribute anything technically.

Bit more about Service Workers

As mentioned, the paralleled Moz/Google development of service workers brought a great deal of innovation to the web platform. We could finally create apps that reliably worked offline – and JQuery 4 and new versions of Angular made this a breeze to set up and use. The missing bit was having the ability to “install” a web app as one installs a real native application. Another great win was Mozilla’s and Google forcing Service Workers to be exclusively used over HTTPS. This really helped with pervasive monitoring (much to the annoyance of the NSA, CIA, and advertisers as they could not spy as freely as before on many new Web apps).

The WebOS killed our last chance at interop

Despite Mozilla’s and Google’s attempts to standardize a manifest format to allow installable web apps, it never really took off. Under the noses of everyone back in 2014, the web was already undergoing major fragmentation. A lot of people knew this, but chose to pretend that it was actually a good thing for the Web… and maybe it was in hindsight. It really started with Chrome OS, but was rapidly legitimized by growth and popularity of Firefox OS. By 2018, FireFox OS had taken a foothold in the lower end of the market capturing around 3% of global market share (the increase could have been greater, but very cheap Android phones put a lot of pressure on Mozilla which caused its growth to slow). By 2016, it was too late to turn back. Mozilla had invested very heavily into their proprietary platform (“Web” APIs, dev tools, docs, marketplace, etc.) and ditching all for lousy/half-baked W3C alternatives didn’t make economic sense (even if they were royalty free). Additionally, it would have been too expensive to deprecate and rewrite the FireFox OS platform to make use of standardized APIs… so they didn’t bother and just kept going with FireFox OS as it was, even if no other platform supported the APIs. But what the hell, it was the “open” alternative.

The fragmentation problems were two fold:

The steady rise of the “embedded web view”.
that there was a large and deliberate attempt to fragment the web into silos: Chrome OS, FireFoxOS, and Windows 8 apps.

Rise of Servo and the embedded web view

Embedded web views was something the clever people that started PhoneGap tuned into early on, and that Adobe bought early on realizing the potential of. Despite the goal of having PhoneGap become irrelevant (by pushing the Web to provide the functionality it provides) – it actually went the other way! PhoneGap/Cordova continued to grow in relevance and popularity. In 2014-2015, Mozilla had also jumped on board along with Google, Microsoft, etc. and the Cordova APIs became the de facto standard by 2016 without undergoing W3C standardization.

To get into this war for the embeddable engine, Mozilla ramped up development of Servo. Making Servo into an embeddable platform made it a great drop in replacement for WebKit (after loosing more contributors to Blink, Apple eventually forked WebKit and made an internal project in 2016). Servo’s embedding API and use of Rust means it quickly became a serious contender against WebKit. Tools like Phantom JS ditch WebKit. By 2019, Gecko was finally killed off and Servo was put in it’s place. The combination of Servo’s reliance on Rust and WebIDL proved to be a winner here. It meant that developers could quickly add new custom features to the platform and use a relatively easy to use language (Rust) to add new features to the browser. This made it even more attractive than Blink as an embeddable browser engine.

Lastly, as FireFox OS moved to using Servo instead of Gecko, it started to support native applications written in Rust. This followed the trend set in 2014, where Google started supporting Android (Java) applications in Chrome OS. By 2024, most apps are written in Rust and a lot make use of Servo as a WebView component. This is great for viewing web content – but most serious app development is now done with either Rust, Java, Foopy (it’s awesome, wait till you play with it!) or Swift. The Web remains mostly a publishing medium.

I got in early on the whole Foopy thing. Made a mint and retired to Portugal and I now have a small fig farm (mostly sell jams and cakes… it’s nice).

Hmm, let’s not “fuck the standards bodies”

At the LXJS conference, Mikeal Rogers made a somewhat outrageous rally cry regarding the role of standardization in the development of NodeJS (and some seemed to generalize it to JavaScript overall)… fast forward to 18:20mins and watch from there:

The most cursory amount of brain activity will yield a “O_o” reaction of contradiction to the above. Yet, there is a deep-seeded frustration that standards organizations need to start taking more seriously or they risk a huge developer backlash.

Of course, it’s ridiculous to just exclaim “fuck those guys” when Node’s primary use cases are a Web Server (including I/O) and ECMAScript interpreter; both of which rely on such a long list of standards that one could spend hours listing them out (HTTP, ECMAScript, Unicode, TPC/IP… bla bla bla).

The cries of dismissal for standards organizations seem to come from the underlying frustrations with the (often misunderstood) standardization processes: it is those processes that go into formalising the technology we rely on for our day to day work (as users, devs, or implementers). Elsewhere, this is what Mikeal describes as “road blocks“, which in many cases is true. For example, the W3C has built-in waiting periods that take months, and forming a new working group can take up to 6 months to a year.

There are at least three points that I think Mikeal was trying to make with his provocative exclamation (in the eloquent vernacular of JS developers, minus the My Little Ponies, cats, rainbows, and unicorns):

Node is a proprietary platform – hence, we can build APIs however the fuck we want (i.e., “fuck ’em! we don’t need ’em”).
The community will set its own standards (i.e., “fuck them, we’ll make our own shit – and we will make it awesome; they are too fucking slow anyway”).
The standards bodies are disconnected from the developers on the ground (i.e., “fuck ’em! they don’t listen to us anyway even when we provide feedback”).

But Javascript developers generally live in two worlds: the Web and Node – so Mikeal’s proclamations need to be carefully considered from both perspectives (Browser and Node – which is, after all, a proprietary platform).

Proprietary VS standards-based

Point 1 above certainly holds – but only for Node. Node has its own core team (once the “Node Illuminati”) that “standardize” the core features to meet the goals of the project. Hence, they don’t really need to care so much about what standards bodies do. Most of the stuff they rely on was done years ago (e.g., HTTP and ECMAScript).

Of course, the Node dudes have to worry a bit about what Google does with V8, and what features they enable or disable by default, but by far and large they don’t really seem to mind what Google does… except in cases where the choice made by a ECMAScript could actually screw with existing community conventions (e.g., modules in ES6).

But then, point 1 above would not hold with Web browsers, obviously. Browsers need standards for two main reasons:

A) So your apps/pages can be used across browsers without pissing off users. There are those that wish there was only Webkit, but unfortunately, there ain’t – and we (developers, users, browser vendors) gotta deal with that… with standards – and that’s a good thing.

B) So to avoid a thermal nuclear patents war – agreeing to a patent policy that allows intellectual property to be shared without the fear that your competitor will sue the pants off ya if you copy their stuff.

Breaking away – again, and again…blessing and a curse

The move out of frustration with politics and processes at standards organizations has happened a million times before: Remember, it was the attempt by a few to disrupt the standards bodies (specially the W3C) that brought us the WHATWG (which is now also considered a standards body).

This renegade group was a blessing, in that they created HTML5 and brought about the death of XHTML – as well as a mass of much needed and fairly rapid innovation and adoption. And it even brought a lot of changes to the W3C, including the creation of the Community Groups…

But also, to some, a curse: the Leviathan/Benevolent Dictator for Life for HTML – and the mostly FUD that was a lack of a patent policy potentially exposing everyone to patent trolls. And the realization that a few browser vendors had installed themselves as the custodians of the Web – and decided that they “knew better” (and often they did!) on all matters Web.

More seriously, was the sense of exclusion of certain communities who had directly participated in the development of HTML in the past (most vocal of which were the accessibility folks, but also folks that process HTML on the server-side… remember the famous “When did you stop beating your wife, Ian?” email?). Good times.

There was even a recent repeat of burnt egos and pissed off developers with the whole responsive images debacle.

(For the record, I think Hixie is one of the best spec editors in the world and, despite a lot of hurt and frustrated egos along the way – including my own from time to time – has done an amazing job with HTML.)

The point is, jumping ship on standards bodies comes with it’s own set of problems.

The role of standards bodies

Standards bodies are just there to provide neutral ground – and a process of working that allows stuff to work across things (e.g., computers). They also provide a legal framework under which companies can share IPR without being accused of collusion – this then hopefully creates larger markets than they could create on their own.

Not having standards slows innovation and progress: this was clearly evident in the monopolistic actions of Microsoft and IE during the earlier part of the millennia. It took years for the WHATWG to reverse engineer IE6’s into the glorious HTML5 family of specs we behold today. So who is to blame for slow: everyone – the HTML standards prior to 5 are mostly crap because of the rapidity at which they were produced (hence they lack the excruciating detail of HTML5). And of course, legally proven monopolistic actions by Microsoft who abandoned moving the Web forward with a somewhat lame attempt to kill it by stagnation – 10 years of IE6! (and further fragment the crap out of it with SilverLight, which thankfully failed spectacularly!).

It’s not all roses at the WHATWG

But even when standards bodies and their renegade counterparts move quickly, they can also f’up royally: consider AppCache, localStorage, illegibility of the WebIDL spec, and the mutterings of hate for the IndexDB API.

So WHATWG folks need to also reflect on the fact that they are not so smart (everyone makes mistakes, but Web standards is prolly not the best place to be making them – because the Web is forever, right?).

At LXJS, at least, there was also little love for HTML5 audio, and parts of canvas (“canvas, y u have fillRect, but no have fillCircle()?”.

But at least it ain’t native apps land

This brings us to the second reason we have standards bodies: the IPR/patent stuff. Just look across the pond to what is happening over in native apps land (thermonuclear war with Apple vs Samsung, Motorolla, Google, HTC, etc. over swiping gestures and rounded corners… or Oracle vs Google over Java – stupid, but it’s happening: when did you hear about someone being sued over HTML/CSS/JS in a serious way?… it’s all love amongst Web browser makers, right?).

Custom stuff

Moving onto point 2 that I distilled from Mikeal’s talk (“we will make our own stuff”) also holds for Node, for better more than for worst. This is true more or less of any community of tool users (the old “[thing] as She is Spoke“). Both Node and the Web at large have built up their own ways of working with somewhat crappy underlying crap that the standards bodies have provided: The canonical examples include JQuery for sanely working with the underlying pile of poo that are the DOM APIs.

In certain cases, lessons learned from the likes of JQuery have made it back into the standards world (e.g., Selectors API – despite the crappy method name, ‘querySelector’ and ‘querySelectorAll’).

They don’t listen to us

Point 3, sadly, and alarmingly also holds – there is a huge divide between developers and standards folks. I think this is changing – or at least something that standards bodies are trying to change.

One important move has been the shift to moving specs to GitHub. In Mikeal’s talk, he made strong points about the common language and development flow introduced by GitHub. This may allow more developers to participate in the standardization process. This is something that the Responsive Images working group is also trying out – though that effort has also just started.

ECMA could really wake up here too. It was only a few months ago that they officially published ECMAScript in HTML! To which Anne van Kesteren mocked, “welcome to the 90’s!”.

Another move, by the W3C at least, has been the creation of a developer conference and hiring respected developers like Lea Verou to help with out reach, and even having accredited courses to teach developers how to best make use of standards*.

(*full disclosure: I teach one of them).

“I’m too scared to look stupid”

But there is also an issue that the development community does not speak up – this is probably because they don’t want to sound dumb. At JSLX, a lot of speakers I spoke to said that WebIDL was “an illegible pile of shit” (not a reflection on the technical aspects of the spec, and Cameron McCormack who edits the spec knows I seriously love that spec). Yet, I think I was the only person that said on the mailing list that it was illegible (of course, I too got told to politely go fuck myself – and generally annoyed those in the Working Group with my flood of questions for clarifications).

Remember, there are no dumb questions: it’s dumb not to question. If you don’t understand something, say so!

What can we do about it?

Wow, if you made it this far, you can claim a free beer at the end!

I already said, the experiment to moving to a GitHub flow is a good thing for the W3C and WHATWG. However, there may be another bit missing in the standardization process to allow anonymized feedback to be collected about the legibility/accessibility of specifications to developers – as well as the usability and quality of APIs. There is a lot of “not invented here” syndrome (and general nasty behavior/dismissal) at standards organizations. The move to GitHub might change that because it forces the spec to leave artificial safety and community boundaries of the standards organization (or at least we hope).

The W3C is also trying to address the situation by hiring a “Packaged Application Specialist“, whose responsibilities include trying to make sure we have a coherent/competitive platform – and that stuff gets done in a more organized and timely manner. I’m hopeful that will help, but only time will tell. This echo’s Joe Hewitt’s call for such an individual about a year ago.

And his follow up warning that helps us rethink the web:

…my definition of the Web then is resources loaded over the Internet using HTTP and then displayed in a hyperlink-capable client. This definition is liberating. It helps me see a future beyond HTML which is still the Web. I can say now that when I exclaim my love for the Web, it’s the freedom of driving the open Internet in a browser that I love, not the rendering technology. Hyperlink traversal matters. The Internet being global and decentralized matters. HTML does not matter.

Just something to chew over… here’s your beer: 🍺(Unicode Character ‘BEER MUG’ (U+1F37A))