Thundering Herd: 2011

Tuesday 20 December 2011

Changes to DOM full-screen API in Firefox 11

We've made some changes to how the HTML full-screen API exits full-screen mode in Firefox 11, which is scheduled to ship in March 2012. Previously Document.mozCancelFullScreen() would fully-exit full-screen and return the browser to "normal" mode. Starting in Firefox 11, Document.mozCancelFullScreen() will restore full-screen state to the element that was previously full-screen. If there is no previous full-screen element in either the document or a parent document (full-screen mode isn't restored to former full-screen elements in child documents), then the browser will "fully-exit full-screen", and return the browser to normal mode.

To see how this is useful, consider the case of a PowerPoint clone or presentation web app that wants to run full-screen. One way to implement such a web app would be to have a full-screen <div> element where the slides are shown. The developer may want to be able to switch full-screen mode seamlessly between the slide deck <div> and (say) a <video>, and then return to having the slide deck <div> as the full-screen element so that the user can carry on with the presentation. Before this change, if the <video> was in a cross-origin subdocument (like a YouTube embedded player in an <iframe>) returning full-screen mode to the slide deck <div> from the <video> was a two-step process; users would have to fully-exit full-screen, and re-request full-screen mode on the slide deck element. Now developers can simply call Document.mozCancelFullScreen() and seamlessly switch back. The browser won't drop out of full-screen mode during the transition.

Note that if users press the escape key they will always fully-exit full-screen, i.e. Firefox won't restore the previous full-screen element to full-screen state on escape key press. So to seamlessly restore full-screen to the previous full-screen element, developers must explicitly call Document.mozCancelFullScreen(), they can't rely on the user pressing the escape key.

We've also added webconsole logging upon full-screen request failures to Firefox 11, to make debugging denied full-screen requests easier.

Another change coming in Firefox 11 is we'll no longer deny full-screen requests in web pages which contain windowed plugins. Now we'll exit full-screen when a windowed plugin is focused instead (on Windows and Linux, MacOSX is unaffected).

Thursday 10 November 2011

Firefox's HTML full-screen API enabled in Nightly builds

A few days ago I enabled the HTML full-screen API in Firefox nightly builds. This enables developers to make an arbitrary HTML element "full-screen", hiding the browser's UI and stretching the element to encompass the entire screen. This will be particularly useful for HTML5 video and games.

If all goes well, this feature will ship in Firefox 10 at the end of January.

The API has changed slightly since I last blogged about it. The current API is Mozilla-specific, but is similar to the W3C's Fullscreen draft specification.

To enter full-screen mode, call the following method on the HTML Element you'd like to enter full-screen:

void mozRequestFullScreen() : posts an asynchronous request to make the HTML element the full-screen element. If the request is granted, some time later a bubbling "mozfullscreenchange" event is dispatched to the element which requested full-screen. If the request is denied, a "mozfullscreenerror" event is dispatched to the element's owning document. We only grant requests for full-screen when:

mozRequestFullScreen() is called in a user-generated event handler, e.g. a mouse click handler, and
the requesting element is in its document, and
there are no windowed plugins present in any document/iframe in the current page, and
all iframes containing the requesting element (if any) have the mozallowfullscreen attribute.

We added the following method and attributes to HTML Document:

void mozCancelFullScreen() : exits the document from full-screen mode. This dispatches a "mozfullscreenchange" event to the document containing the (now former) full-screen element. Note that the "mozfullscreenchange" event which is dispatched when you enter full-screen is targeted at the full-screen element, so if you want to receive the "mozfullscreenchange" on both entering and exiting full-screen in the same listener you should add your listener to the document, rather than the full-screen element.
readonly attribute boolean mozFullScreen : true when the document is in full-screen mode.
readonly attribute Element mozFullScreenElement : reference to the current full-screen element.
readonly attribute boolean mozFullScreenEnabled : returns true if calls to mozRequestFullScreen() would be granted in the current document. This returns false if there are any windowed plugins present in any document/iframe in the current page, or if any iframes containing this document don't have the mozallowfullscreen attribute present, or if the user has disabled the API by preference. If this returns false you may want to not show the user your enter-full-screen button in your page, since you know it won't work!

We also added the :-moz-full-screen css pseudo class, which applies to the full-screen element while in full-screen mode.

We added the mozallowfullscreen attribute to iframe elements. Without this, full-screen requests made by script in the iframe's content (i.e embedded ads, or a YouTube player in an iframe for that matter) will be denied.

While in full-screen mode, the user can press the ESC key (or F11) to exit. Alpha-numeric keyboard input while in full-screen mode causes a warning message to pop-up to guard against phishing attacks. The only key input which doesn't cause the warning message to pop up are: left, right, up, down, space, shift, control, alt, page up, page down, end, home, tab, and meta.

Navigating, changing tab, changing app (ALT+TAB) while in full-screen mode will cause full-screen mode to exit.

Here's about a simple example, which will work in current Firefox nightly builds:

(Press ESC to exit full-screen)

The code for that button's onclick handler is simply:
document.getElementById('bruce_video').mozRequestFullScreen();

How is Firefox's full-screen API different from Webkit/Chrome/Safari's full-screen API? Firefox's API adds a "width: 100%; height: 100%;" CSS rule to the element which requests full-screen, so that it's stretched to occupy the entire screen. Chrome's API does not do this, but instead it centers the full-screen element in the window and blacks-out the underlying webpage. So the full-screen element won't occupy the entire screen with Chrome's API unless you specify a "width: 100%; height: 100%;" rule yourself. Conversely if you want to vertically and horizontally center something while in full-screen with Firefox's API, you need to make the containing element of your desired centered element full-screen instead, and apply CSS rules to vertically and horizontally center the contained element.

For a cross-browser full-screen API example, see html5-demos.appspot.com's full-screen demo.

Edit: 11 Nov 2011, clarified Document.mozCancelFullScreen() and Document.mozFullScreenEnabled, fixed typos.

Thursday 22 September 2011

Mozilla full-screen API progress update

Update 10 November 2011: the full-screen API has been changed slightly and enabled in Firefox Nightly builds, see http://blog.pearce.org.nz/2011/11/firefoxs-html-full-screen-api-enabled.html for details.

I've been working on implementing Robert O'Callahan's HTML full-screen API proposal in Firefox (bug 545812). Support for the base API has landed, disabled by default, in Firefox nightly builds. To enable the full-screen API, set the pref full-screen-api.enabled to true.

We have implemented a general purpose full-screen API which can make any HTML element the full-screen element (it seems WebKit based browsers' full-screen API allow only making <video> elements full-screen).

This feature makes the following API changes to HTML Element:

void mozRequestFullScreen() : makes an HTML element the full-screen element. Causes browser chrome to hide, and expands the element to encompass the entire screen. Upon success, this dispatches a "mozfullscreenchange" event to the requesting full-screen element, or the element's owner document if the element is not in a document. We only grant requests for full-screen when running in user-generated event handlers, e.g. a mouse click handler.

This feature makes the following API changes to HTML Document:

void mozCancelFullScreen() : exits the document from full-screen mode.
readonly attribute mozFullScreen : true when the document is in full-screen mode.
readonly attribute mozFullScreenElement : reference to the current full-screen element, if it's in the current document.

This feature adds the :-moz-full-screen css pseudo class, which applies to the full-screen element while in full-screen mode.

For a request for full-screen to be granted in content inside an iframe, the containing iframe needs to have the mozallowfullscreen attribute present. This is a boolean attribute, so the attribute only needs to be present, it doesn't matter what value it's set to.

Keyboard input is restricted in full-screen mode. When alpha-numeric key input occurs in full-screen mode, full-screen mode immediately exits. This is to help protect against phishing attacks.

We also plan to deny requests for full-screen mode when windowed plugins are present (since we can't easily monitor key events to windowed plugins on non-MacOSX platforms). We will exit full-screen mode when a windowed plugin is added to a document as well. I have a patch for this, but its dependencies haven't landed yet.

Work remaining to be done before this can be enabled:

Adding a warning message when we enter DOM full-screen mode (on desktop Firefox, and on Fennec too).
Making the full-screen API work in multi-process Firefox/Fennec (bug 684620). This requires a way of getting the PBrowserParent from C++ in the chrome process to be implemented, there's not a way to do that yet unfortunately.
Make change/open tab cause full-screen mode to exit (bug 685402).
A security review must be completed, and concerns raised there must be addressed. This could involve changing the API.

We also want a clearer transition effect when entering full-screen, to somehow show the full-screen element "stretching out" to encompass the screen.

You can test out our work-in-progress full-screen implementation, by grabbing the latest Firefox nightly build, setting the pref full-screen-api.enabled to true, and pointing your browser at my not-very-exciting full-screen API demo page.

Thursday 25 August 2011

New media element APIs and better media seeking resolution

French intern Paul Adenot has recently implemented the seekable and played attributes on the HTML5 video and audio elements in Firefox. The seekable attribute enables script to see what regions of the media can be seeked into (particularly handy with live streams), and the played attribute enables script to see what regions of the media has already been played. Paul has also done some work improving the built in controls on media elements. Thanks for your hard work Paul! These should be available in release builds in November (Firefox 8).

Also in Firefox 8 are my changes to media seeking resolution. Now media seeking should be accurate to the nearest microsecond. It's been reported elsewhere how important accurate seeking for video is. We were previously accurate to the nearest video frame, but we could still be up to one audio packet off (often between 4 and 8 ms out). Now we prune audio samples when seeking so we're down to microsecond resolution.

Wednesday 3 August 2011

Simple rate limited HTTP server for testing HTML5 media/streaming

While working on the Firefox HTML5 video and audio support, I've found it extremely useful to have an HTTP server on which the transfer rate is reliably limited. Existing servers are either too heavy weight, like apache, or have inconsistent rate-limiting, like lighttpd which I found to have very "bursty" rate limiting.

I ended up taking the educational route, and implementing a simple HTTP server in C++. It supports the following features:

Support for HTTP1.1 Byte Range Requests. This means you can seek into unbuffered data when watching HTML5 video.
Rate limiting, configurable on a per request basis by passing the "rate=x" HTTP query parameter, where x is the transfer rate of the connection in kilobytes per second. The server will send x/10 KB ten times per second to maintain this rate smoothly.
Simulated live streaming, configurable on a request basis by passing the "live" query parameter. When in "live" mode, no Content-Length header is sent, and the server doesn't advertise or perform byte range requests - so you can't seek into unbuffered video/audio, just like in a live stream.
Cross platform; tested on Windows (runs on port 80) and Linux (runs on port 8080). I haven't test it on MacOS yet.
Simply serves all files in the program's working directory, making it easy to use (and abuse).
Open source! Get the code at https://github.com/cpearce/HttpMediaServer, or download a pre-built win32 binary.

For example, if you wanted to simulate a live stream being served at 100KB/s, your test URL might look something like http://localhost:80/video.ogg?rate=100&live.

I've been using it for quite a while, and over the weekend I finally cleaned it up and put it up on GitHub. Check it out.

Thursday 28 July 2011

Reducing the memory overhead of thread stacks

I recently landed bug 664341 into mozilla-central, which adds an API to specify the amount of virtual address space reserved for the thread stacks of nsIThreads. The new API looks like this:

extern NS_COM_GLUE NS_METHOD
NS_NewThread(nsIThread **result,
nsIRunnable *initialEvent = nsnull,
PRUint32 stackSize = nsIThreadManager::DEFAULT_STACK_SIZE);

The default stack size is the default for whatever platform your running on, so behaviour is unchanged for existing uses of NS_NewThread.

This new API is important, as x86 Linux by default reserves 8MB of virtual address space per thread stack. Windows and OSX use 1MB and 64KB respectively. If you have a lot of threads, their stacks can hog the virtual address space, and malloc will fail; we had at least one media mochitest that could fail in this way.

If you have code that creates threads, you should consider using this API. It's an easy way to reduce perceived memory usage.

I've also recently concluded a major refactoring of the media playback engine in Firefox. This reduces the number of threads required to play <audio> and <video> elements by roughly one third. We now only require two threads per playing media element (plus one extra thread for sound playback on Linux at least until bug 623444 lands and we can refactor to take advantage of that). Media elements which are paused now shut down their threads where possible, resulting in lower overall memory usage. If you have a page with 100 <audio> elements on it, you no longer have 300 threads lying around using up virtual address space!

Wednesday 8 June 2011

Impressions of China 2011

I have just returned from traveling with my wife and her parents and sister for two weeks in China and Hong Kong.

It was an interesting experience. It's easy to see why so many people say that this century will be China's century.

My impressions of China are below. No doubt some people will disagree with them. Constructive comments welcome.

The Chinese plan long term. They're building infrastructure that they'll need in 20 years. The leadership doesn't need to worry about long term projects appearing to their electorate that they're not achieving results. They don't need to borrow money in order to pay for the overly generous election promises required to get them elected. This seems to me to be one of the primary strengths of the Chinese communist system, and one of the failings of democracy. I am not implying that either system is necessarily superior.

All housing is leasehold. When you buy a house, you buy a 30 year lease for a residential city title, and longer leases are available for rural and commercial building (IIRC). If the government wants the land to build a road or whatever, they take the land back, the road gets built, and you go somewhere else.
As a corollary of my previous point, the Chinese get things done. They don't go through rounds of resource consent spanning years when they want to build something. Some engineer draws a line on a map, and it happens.
Communism works in China. Gone are the bad old days of the revolution and the madness contained therein. The Chinese have embraced a form of capitalism and made it work with their system. Numerous Chinese ex-pats have told me that people who don't rock the boat can live pretty free and happy lives. The people who rock the boat may not...
The top echelons of Chinese Government are engineers, and it shows. They build physical things and encourage the manufacture of physical products. They don't wrangle over IP laws designed to plug the leaks in dying business models. They build stuff. Then they sell it, and everyone benefits.

Intellectual property is not respected. They plagiarise and copy blatantly. I suspect this is part of their recent rapid rise; they didn't have to invent or start from scratch the same way the west did when it developed, the Chinese just copied what the west had done. Once they've caught up across the board, it will be interesting to see how lax intellectual property law/enforcement affects their economy and how they do business in future
Everything is cheap. Food is cheap (and may be subject to government price controls now or in future to ensure it remains cheap). This means the base cost of living is low, so wages can also be low, and manufactured (exported) goods are cheap. The price of good quality food in China was easily one fifth of what you pay in New Zealand.
Perhaps as a corollary or my previous point, the quality of workmanship in China is in general very low, and they don't seem to put very much emphasis on stream-lining many of their processes (store check outs, even in department stores are slow; why make it hard for customers to give you their money?).
There are [fun] police everywhere (at least in the tourist traps and the popular areas I visited). Plenty of stern faced young men in uniform patrol the streets with often-used whistles to keep people off the grass, to keep bikes off designated areas, to keep people from sitting on walls, and in general to keep people in line. Though one of their main duties seems to be to giving directions to people.
They have electric bicycles where the battery charges while you pedal. They're everywhere, and very quiet - so they can sneak up on you. Seems a great and environmentally friendly way to get around flat cities.

The Chinese can be "a but rough around the edges". Spitting on the ground is common practice (and may be a consequence of the bad air, and the prevalence of smoking). If they were kiwi, I'd describe them as "unashamedly blokey".
The Chinese do things at scale. When we crossed from Hong Kong to Shenzhen by bus, we crossed a long bridge over Shenzhen bay. There were rafts supporting an oyster farm spanning Shenzhen bay there, which stretched as far as the eye could see in either direction (the air was hazy/polluted, which reduced visibility to about 3km or so, but still. Impressive.
The "Maorish Village" in the "Wonders of the World" theme park in Shenzhen was hilariously inaccurate. The "Maori" people were not Maori (probably of south-western Chinese descent), and they performed a ramshackle show which was a fusion of a dozen different Pacific cultures. They repeatedly shouted "Aloha" (which is Hawaiian, they should at least say "kia ora" for a Maori greeting), and danced around in Cook Island costumes, claiming it was Maori. As a Pākehā, I'm offended on behalf of my Maori countrymen.

You couldn't see the sun most of the time in the big cities due to the pollution. The air tastes vile.
Traffic in Shanghai borders on being civilized. Other parts of China, less so.
White people are treated well, but prone to being overcharged. If you're a struggling dancer, move to China! White people are in demand in this area.
Mandatory kit for all white people in China should be a t-shirt that says "No buy DVD. No buy T-Shirt. No buy Bag." Bonus points if it's written in Chinese.
There are plenty of white people in Shanghai. Elsewhere, less so.
Shanghai is cool. There's lots of sci-fi-esque buildings all over the place. Star fleet head quarters should be built there.

They never miss an opportunity to ruin a perfectly good event/tour/attraction by trying to sell you stuff. We went on a day-trip guided tour, and after lunch we were taken into a fish oil factory and subjected to a 30 minute power point presentation trying to scare us into buying their products. I could barely believe it was happening!
We took the MagLev train in Shanghai to the airport. It went 434km/h. It was seriously cool. We should totally get one of those.
My wife's family is involved with an English school in Chongqing, China. If you're interested in teaching English in China, let me know, they're hiring. They're looking to hire English-speaking white people, English doesn't need to be your native language. Yes, I know many people of other ethnicities with excellent English, but the locals feel it's more prestigious to learn English from white skinned people.

Thursday 31 March 2011

HTML5 Video painting performance statistics in Firefox 5

I've landed video frame paint performance counters for HTML5 video onto mozilla-central. This should ship in Firefox 5, barring any disasters. This work was a combined effort by Chris Double and I. These are Mozilla specific fields which will only be available in Firefox.

The new statistics enable us to measure the performance of the video decoding and frame painting pipeline.

This adds the following fields to the HTMLVideoElement:

mozParsedFrames - A count of the number of video frames that have been demuxed/parsed from the media resource. If we were playing perfectly, we'd be able to paint this many frames.
mozDecodedFrames - A count of the number of deumxed/parsed video frames that have been decoded into Images. We skip decoding of parsed/demuxed frames if the decode is falling behind the playback position (this can happen if it takes a long time to decode a keyframe for example).
mozPresentedFrames - A count of the number of decoded frames that have been presented to the rendering pipeline for painting (set as the current Image on the video element's ImageContainer). We may not present decoded frames if the frame arrives for presentation late.
mozPaintedFrames - A count of the number of presented frames which were painted on screen. We may end up not painting presented frames if another frame is presented before the graphics pipeline has time to paint the previously presented frame, or if the video is off screen.
mozFrameDelay - The time (as a floating point number in seconds) which the last painted video frame was rendered late by. This is the time duration between the decoder saying "paint frame X now", and the graphics pipeline physically getting frame X displayed on the screen. The value is accurate on desktop Firefox, but not on mobile. Improvements in the graphics pipeline, and the integration with the graphics pipeline, will show up as a decrease in this number.

Here's a demo of the video paint statistics in Firefox 5. You'll need a recent Firefox trunk nightly build for the demo to work.

Thursday 17 February 2011

Firefox 4 video decoder architecture

To assist others coming up to speed on the architecture of the video decoder, I've put together a diagram of Firefox 4's video playback engine. We rewrote our video architecture for Firefox 4 in order to give us better control over the complete stack.

Click on the image for a larger diagram.

The key classes in our architecture are:

nsHTMLMediaElement - This manages the JavaScript/HTML accessible HTMLMediaElement interface, and implements the resource selection, load, and preload logic.
nsBuiltinDecoder - Manages a main thread accessible snapshot of the state of the underlying decoder. The decoders run on non-main threads, and we don't want to block the main thread to dig into the decoders when JS queries playback state, so we maintain a snapshot of the playback engine's state in this class. This inherits from nsMediaDecoder. You can also implement playback support for a new format by inheriting and implementing nsMediaDecoder. nsWaveDecoder is currently implemented this way, but we're in the process of reimplementing that as a sublcass of nsBuiltinDecoderReader.
nsBuiltinDecoderStateMachine - Manages the decode, state machine, audio-push threads, frame queueing, A/V sync, and buffering logic. This ensures that all the HTML5 events get dispatched at the appropriate time, and that behaviour is consistent and sane across different media types. Demuxing is handled abstractly by subclasses of nsBuiltinDecoderReader. This way all media types can share as much playback logic as possible, reducing our maintenance overhead.
nsOggReader/nsWebMReader - Demuxing and codec specific functionality is implemented by subclassing nsBuiltinDecoderReader. This reduces the amount of work required to implement and maintain support for new codecs. When a new codec is implemented as a nsBuiltinDecoderReader subclass, support for HTML events, buffering, and playback logic does not need to be reimplemented, since it already exists in nsBuiltinDecoderStateMachine. To add support for a new codec, it's easiest to implement support as a new nsBuiltinDecoderReader subclass.
nsAudioStream - Our cross platform audio API wrapper. It is based on libsydneyaudio, which operates on a push model rather than a (more commonly used) callback-based model, which has brought in a whole raft of headaches. Matthew Gregan is in the process of rewriting our audio layer to a more sane callback based model. We also provide a cross-process nsAudioStreamRemote, which proxies audio commands to an audio stream in another process. This is required on mobile.
ImageContainer - When it comes time to present a video frame, nsBuiltinDecoderStateMachine sets it as the "current image" of the video element's ImageContainer object. This then propagates through the Layers/2D scene rendering system, and it eventually gets rendered on the screen. The Layers compositing runs on the main thread, and ImageContainer provides a thread-safe wrapper. The images contained in the ImageContainer can be in OpenGL/D3D surfaces, so we can take advantage of hardware accelerated scaling, rendering, and YCbCr to RGB conversion.
nsVideoFrame - This resides in layout, and manages the dimensions/reflow of the video, as well as its poster image.
nsMediaStream - Our network code runs on the main thread, but the underlying libraries we use for media decoding (libvpx, libtheora, etc) assume synchronous reads. We can't afford to do blocking reads on the main thread, so we cache the media data downloaded into the nsMediaCache, and provide a thread-safe wrapper synchronous wrapper for reading in the nsMediaStream class. We use Necko for our networking, so we can take advantage of all the existing security and load-group functionality it implements.

The advantage of controlling the entire playback engine are many. We can easily control frame dropping, memory allocation, the threading model, what, when, and how we decode, and we can integrate more tightly with our network stack.

Wednesday 12 January 2011

Google to remove H.264 support from Chrome

Google is removing H.264 support from Chrome in the coming months. This is good news, as it means we're closer to having a high quality patent unencumbered video format for use in HTML5 video that works across all browsers. Now we're just left with Safari, Apple's iDevices, and IE9 as natively supporting patent encumbered video formats. Microsoft has said that IE9 will play "VP8 video when the user has installed a VP8 codec on Windows", it will be interesting to see how that pans out.