Hmm, let’s not “fuck the standards bodies”

At the LXJS conference, Mikeal Rogers made a somewhat outrageous rally cry regarding the role of standardization in the development of NodeJS (and some seemed to generalize it to JavaScript overall)… fast forward to 18:20mins and watch from there:

The most cursory amount of brain activity will yield a “O_o” reaction of contradiction to the above. Yet, there is a deep-seeded frustration that standards organizations need to start taking more seriously or they risk a huge developer backlash.

Of course, it’s ridiculous to just exclaim “fuck those guys” when Node’s primary use cases are a Web Server (including I/O) and ECMAScript interpreter; both of which rely on such a long list of standards that one could spend hours listing them out (HTTP, ECMAScript, Unicode, TPC/IP… bla bla bla).

The cries of dismissal for standards organizations seem to come from the underlying frustrations with the (often misunderstood) standardization processes: it is those  processes that go into formalising the technology we rely on for our day to day work (as users, devs, or implementers). Elsewhere, this is what Mikeal describes as ”road blocks“, which in many cases is true. For example, the W3C has built-in waiting periods that take months, and forming a new working group can take up to 6 months to a year.

There are at least three points that I think Mikeal was trying to make with his provocative exclamation (in the eloquent vernacular of JS developers, minus the My Little Ponies, cats, rainbows, and unicorns):

  1. Node is a proprietary platform – hence, we can build APIs however the fuck we want (i.e., “fuck ‘em! we don’t need ‘em”).
  2. The community will set its own standards (i.e., “fuck them, we’ll make our own shit – and we will make it awesome; they are too fucking slow anyway”).
  3. The standards bodies are disconnected from the developers on the ground (i.e., “fuck ‘em! they don’t listen to us anyway even when we provide feedback”).

But Javascript developers generally live in two worlds: the Web and Node – so Mikeal’s proclamations need to be carefully considered from both perspectives (Browser and Node – which is, after all, a proprietary platform).

Proprietary VS standards-based

Point 1 above certainly holds – but only for Node. Node has its own core team (once the “Node Illuminati”) that “standardize” the core features to meet the goals of the project. Hence, they don’t really need to care so much about what standards bodies do. Most of the stuff they rely on was done years ago (e.g., HTTP and ECMAScript).

Of course, the Node dudes have to worry a bit about what Google does with V8, and what features they enable or disable by default, but by far and large they don’t really seem to mind what Google does… except in cases where the choice made by a ECMAScript could actually screw with existing community conventions (e.g., modules in ES6).

But then, point 1 above would not hold with Web browsers, obviously. Browsers need standards for two main reasons:

A) So your apps/pages can be used across browsers without pissing off users. There are those that wish there was only Webkit, but unfortunately, there ain’t – and we (developers, users, browser vendors) gotta deal with that… with standards – and that’s a good thing.

B) So to avoid a thermal nuclear patents war – agreeing to a patent policy that allows intellectual property to be shared without the fear that your competitor will sue the pants off ya if you copy their stuff.

Breaking away – again, and again…blessing and a curse

The move out of frustration with politics and processes at standards organizations has happened a million times before: Remember, it was the attempt by a few to disrupt the standards bodies (specially the W3C) that brought us the WHATWG (which is now also considered a standards body).

This renegade group was a blessing, in that they created HTML5 and brought about the death of XHTML – as well as a mass of much needed and fairly rapid innovation and adoption.  And it even brought a lot of changes to the W3C, including the creation of the Community Groups

But also, to some, a curse: the Leviathan/Benevolent Dictator for Life for HTML – and the mostly FUD that was a lack of a patent policy potentially exposing everyone to patent trolls. And the realization that a few browser vendors had installed themselves as the custodians of the Web – and decided that they “knew better” (and often they did!) on all matters Web.

More seriously, was the sense of exclusion of certain communities who had directly participated in the development of HTML in the past (most vocal of which were the accessibility folks, but also folks that process HTML on the server-side… remember the famous “When did you stop beating your wife, Ian?” email?). Good times.

There was even a recent repeat of burnt egos and pissed off developers with the whole responsive images debacle.

(For the record, I think Hixie is one of the best spec editors in the world and, despite a lot of hurt and frustrated egos along the way – including my own from time to time – has done an amazing job with HTML.)

The point is, jumping ship on standards bodies comes with it’s own set of problems.

The role of standards bodies

Standards bodies are just there to provide neutral ground – and a process of working that allows stuff to work across things (e.g., computers). They also provide a legal framework under which companies can share IPR without being accused of collusion – this then hopefully creates larger markets than they could create on their own.

Not having standards slows innovation and progress: this was clearly evident in the monopolistic actions of Microsoft and IE during the earlier part of the millennia. It took years for the WHATWG to reverse engineer IE6′s into the glorious HTML5 family of specs we behold today. So who is to blame for slow: everyone – the HTML standards prior to 5 are mostly crap because of the rapidity at which they were produced (hence they lack the excruciating detail of HTML5). And of course, legally proven monopolistic actions by Microsoft who abandoned moving the Web forward with a somewhat lame attempt to kill it by stagnation – 10 years of IE6! (and further fragment the crap out of it with SilverLight, which thankfully failed spectacularly!).

It’s not all roses at the WHATWG

But even when standards bodies and their renegade counterparts move quickly, they can also f’up royally: consider AppCache, localStorage, illegibility of the WebIDL spec, and the mutterings of hate for the IndexDB API.

So WHATWG folks need to also reflect on the fact that they are not so smart (everyone makes mistakes, but Web standards is prolly not the best place to be making them – because the Web is forever, right?).

At LXJS, at least, there was also little love for HTML5 audio, and parts of canvas (“canvas, y u have fillRect, but no have fillCircle()?”.

But at least it ain’t native apps land

This brings us to the second reason we have standards bodies: the IPR/patent stuff. Just look across the pond to what is happening over in native apps land (thermonuclear war with Apple vs Samsung, Motorolla, Google, HTC, etc. over swiping gestures and rounded corners… or Oracle vs Google over Java – stupid, but it’s happening: when did you hear about someone being sued over HTML/CSS/JS in a serious way?… it’s all love amongst Web browser makers, right?).

Custom stuff

Moving onto point 2 that I distilled from Mikeal’s talk (“we will make our own stuff”) also holds for Node, for better more than for worst. This is true more or less of any community of tool users (the old “[thing] as She is Spoke“). Both Node and the Web at large have built up their own ways of working with somewhat crappy underlying crap that the standards bodies have provided: The canonical examples include JQuery for sanely working with the underlying pile of poo that are the DOM APIs.

In certain cases, lessons learned from the likes of JQuery have made it back into the standards world (e.g., Selectors API – despite the crappy method name, ‘querySelector’ and ‘querySelectorAll’).

They don’t listen to us

Point 3, sadly, and alarmingly also holds – there is a huge divide between developers and standards folks. I think this is changing – or at least something that standards bodies are trying to change.

One important move has been the shift to moving specs to GitHub. In Mikeal’s talk, he made strong points about the common language and development flow introduced by GitHub. This may allow more developers to participate in the standardization process. This is something that the Responsive Images working group is also trying out – though that effort has also just started.

ECMA could really wake up here too. It was only a few months ago that they officially published ECMAScript in HTML! To which Anne van Kesteren mocked, “welcome to the 90′s!”.

Another move, by the W3C at least, has been the creation of a developer conference and hiring respected developers like Lea Verou to help with out reach, and even having accredited courses to teach developers how to best make use of standards*.

(*full disclosure: I teach one of them).

“I’m too scared to look stupid”

But there is also an issue that the development community does not speak up – this is probably because they don’t want to sound dumb. At JSLX, a lot of speakers I spoke to said that WebIDL was “an illegible pile of shit” (not a reflection on the technical aspects of the spec, and Cameron McCormack who edits the spec knows I seriously love that spec). Yet, I think I was the only person that said on the mailing list that it was illegible (of course, I too got told to politely go fuck myself – and generally annoyed those in the Working Group with my flood of questions for clarifications).

Remember, there are no dumb questions: it’s dumb not to question. If you don’t understand something, say so! 

What can we do about it?

Wow, if you made it this far, you can claim a free beer at the end!

I already said, the experiment to moving to a GitHub flow is a good thing for the W3C and WHATWG. However, there may be another bit missing in the standardization process to allow anonymized feedback to be collected about the legibility/accessibility of specifications to developers – as well as the usability and quality of APIs. There is a lot of “not invented here” syndrome (and general nasty behavior/dismissal) at standards organizations. The move to GitHub might change that because it forces the spec to leave artificial safety and community boundaries of the standards organization (or at least we hope).

The W3C is also trying to address the situation by hiring a “Packaged Application Specialist“, whose responsibilities include trying to make sure we have a coherent/competitive platform – and that stuff gets done in a more organized and timely manner. I’m hopeful that will help, but only time will tell. This echo’s Joe Hewitt’s call for such an individual about a year ago.

And his follow up warning that helps us rethink the web:

…my definition of the Web then is resources loaded over the Internet using HTTP and then displayed in a hyperlink-capable client. This definition is liberating. It helps me see a future beyond HTML which is still the Web. I can say now that when I exclaim my love for the Web, it’s the freedom of driving the open Internet in a browser that I love, not the rendering technology. Hyperlink traversal matters. The Internet being global and decentralized matters. HTML does not matter.

Just something to chew over… here’s your beer: 🍺(Unicode Character ‘BEER MUG’ (U+1F37A))

Zip files and Encoding – I hate you.

I’ve written about some of the issues with depending on zip as a packaging format in the past. As people know, Web Apps is depending on Zip as the packaging format for Widgets.

Zip the good

Zip has a lot going for it. It is ubiquitous and dependable… so long as you don’t want to share files across cultures.

Zip the bad

The Zip spec does not seem to know that there are normalization models for UTF-8, when there are actually 4 (or more, because there is some non-standard ones too!). The Zip file gives no guidance as to how file names inside zip files are to be normalized.

Consider, when a zip file is created on Linux, it just writes the bytes for the file name in the encoding of the underlying file system. So, if the file system is in ISO-8859-1, the bytes are written in ISO-8859-1. This may seem ok, but when you decompress the zip file on Windows, which runs on encoding Windows-1252, the file names get all mangled. If the underlying encoding of the file system on Linux is something else, you won’t be able to share files with other systems at all. So in this case, it is not Window’s fault.

The Zip spec says that the only supported encodings are CP437 and UTF-8, but everyone has ignored that. Implementers just encode file names however they want (usually byte for byte as they are in the OS… see table below).

It gets worst! because MacOS runs on some weird non-standard decomposed Unicode mode, you can only share zip files with other MacOs users. According to this email, the LimeWire guys also ran into a similar problem with regards to encodings in MacOS:

“for example a French, German or Spanish Windows user cannot exchange files that contain [file names with] French, German or Spanish accents with a French, German or Spanish Macintosh users”

The following table illustrates the problem:

Bytes that represent ñ in a Zip file (in hex)
File name Zip in Windows Zip in Linux Zip in Mac OS
ñ a4 (Extended US-ASCII/CP437) C3 B1 (UTF-8 NFC) 6E CC 83 (UTF-8 NFD)

Yes! holly crap! three different byte sequences corresponding to different character encodings.

The only way around this would be a *special* custom-built widget zipping tool that normalizes file name strings to NFC. If the widget engine needs to decompress the widget to disk, then it would take the NFC and convert them to the operating system’s native encoding (or store the files in memory, and reference them that way). This affects the URI scheme and DOM normalization of Widgets, so Web Apps will have to deal with it eventually… but not sure exactly how.