Thursday, 22 September 2011

Mozilla full-screen API progress update

Update 10 November 2011: the full-screen API has been changed slightly and enabled in Firefox Nightly builds, see http://blog.pearce.org.nz/2011/11/firefoxs-html-full-screen-api-enabled.html for details.

I've been working on implementing Robert O'Callahan's HTML full-screen API proposal in Firefox (bug 545812). Support for the base API has landed, disabled by default, in Firefox nightly builds. To enable the full-screen API, set the pref full-screen-api.enabled to true.

We have implemented a general purpose full-screen API which can make any HTML element the full-screen element (it seems WebKit based browsers' full-screen API allow only making <video> elements full-screen).

This feature makes the following API changes to HTML Element:
  1. void mozRequestFullScreen() : makes an HTML element the full-screen element. Causes browser chrome to hide, and expands the element to encompass the entire screen. Upon success, this dispatches a "mozfullscreenchange" event to the requesting full-screen element, or the element's owner document if the element is not in a document. We only grant requests for full-screen when running in user-generated event handlers, e.g. a mouse click handler.
This feature makes the following API changes to HTML Document:
  1. void mozCancelFullScreen() : exits the document from full-screen mode.
  2. readonly attribute mozFullScreen : true when the document is in full-screen mode.
  3. readonly attribute mozFullScreenElement : reference to the current full-screen element, if it's in the current document.
This feature adds the :-moz-full-screen css pseudo class, which applies to the full-screen element while in full-screen mode.

For a request for full-screen to be granted in content inside an iframe, the containing iframe needs to have the mozallowfullscreen attribute present. This is a boolean attribute, so the attribute only needs to be present, it doesn't matter what value it's set to.

Keyboard input is restricted in full-screen mode. When alpha-numeric key input occurs in full-screen mode, full-screen mode immediately exits. This is to help protect against phishing attacks.

We also plan to deny requests for full-screen mode when windowed plugins are present (since we can't easily monitor key events to windowed plugins on non-MacOSX platforms). We will exit full-screen mode when a windowed plugin is added to a document as well. I have a patch for this, but its dependencies haven't landed yet.

Work remaining to be done before this can be enabled:
  1. Adding a warning message when we enter DOM full-screen mode (on desktop Firefox, and on Fennec too).
  2. Making the full-screen API work in multi-process Firefox/Fennec (bug 684620). This requires a way of getting the PBrowserParent from C++ in the chrome process to be implemented, there's not a way to do that yet unfortunately.
  3. Make change/open tab cause full-screen mode to exit (bug 685402).
  4. A security review must be completed, and concerns raised there must be addressed. This could involve changing the API.
We also want a clearer transition effect when entering full-screen, to somehow show the full-screen element "stretching out" to encompass the screen.

You can test out our work-in-progress full-screen implementation, by grabbing the latest Firefox nightly build, setting the pref full-screen-api.enabled to true, and pointing your browser at my not-very-exciting full-screen API demo page.

Thursday, 25 August 2011

New media element APIs and better media seeking resolution

French intern Paul Adenot has recently implemented the seekable and played attributes on the HTML5 video and audio elements in Firefox. The seekable attribute enables script to see what regions of the media can be seeked into (particularly handy with live streams), and the played attribute enables script to see what regions of the media has already been played. Paul has also done some work improving the built in controls on media elements. Thanks for your hard work Paul! These should be available in release builds in November (Firefox 8).

Also in Firefox 8 are my changes to media seeking resolution. Now media seeking should be accurate to the nearest microsecond. It's been reported elsewhere how important accurate seeking for video is. We were previously accurate to the nearest video frame, but we could still be up to one audio packet off (often between 4 and 8 ms out). Now we prune audio samples when seeking so we're down to microsecond resolution.

Wednesday, 3 August 2011

Simple rate limited HTTP server for testing HTML5 media/streaming

While working on the Firefox HTML5 video and audio support, I've found it extremely useful to have an HTTP server on which the transfer rate is reliably limited. Existing servers are either too heavy weight, like apache, or have inconsistent rate-limiting, like lighttpd which I found to have very "bursty" rate limiting.

I ended up taking the educational route, and implementing a simple HTTP server in C++. It supports the following features:

  1. Support for HTTP1.1 Byte Range Requests. This means you can seek into unbuffered data when watching HTML5 video.
  2. Rate limiting, configurable on a per request basis by passing the "rate=x" HTTP query parameter, where x is the transfer rate of the connection in kilobytes per second. The server will send x/10 KB ten times per second to maintain this rate smoothly.
  3. Simulated live streaming, configurable on a request basis by passing the "live" query parameter. When in "live" mode, no Content-Length header is sent, and the server doesn't advertise or perform byte range requests - so you can't seek into unbuffered video/audio, just like in a live stream.
  4. Cross platform; tested on Windows (runs on port 80) and Linux (runs on port 8080). I haven't test it on MacOS yet.
  5. Simply serves all files in the program's working directory, making it easy to use (and abuse).
  6. Open source! Get the code at https://github.com/cpearce/HttpMediaServer, or download a pre-built win32 binary.
For example, if you wanted to simulate a live stream being served at 100KB/s, your test URL might look something like http://localhost:80/video.ogg?rate=100&live.

I've been using it for quite a while, and over the weekend I finally cleaned it up and put it up on GitHub. Check it out.

Thursday, 28 July 2011

Reducing the memory overhead of thread stacks

I recently landed bug 664341 into mozilla-central, which adds an API to specify the amount of virtual address space reserved for the thread stacks of nsIThreads. The new API looks like this:

  extern NS_COM_GLUE NS_METHOD
  NS_NewThread(nsIThread **result,
               nsIRunnable *initialEvent = nsnull,
               PRUint32 stackSize = nsIThreadManager::DEFAULT_STACK_SIZE);

The default stack size is the default for whatever platform your running on, so behaviour is unchanged for existing uses of NS_NewThread.

This new API is important, as x86 Linux by default reserves 8MB of virtual address space per thread stack. Windows and OSX use 1MB and 64KB respectively. If you have a lot of threads, their stacks can hog the virtual address space, and malloc will fail; we had at least one media mochitest that could fail in this way.

If you have code that creates threads, you should consider using this API. It's an easy way to reduce perceived memory usage. 

I've also recently concluded a major refactoring of the media playback engine in Firefox. This reduces the number of threads required to play <audio> and <video> elements by roughly one third. We now only require two threads per playing media element (plus one extra thread for sound playback on Linux at least until bug 623444 lands and we can refactor to take advantage of that). Media elements which are paused now shut down their threads where possible, resulting in lower overall memory usage. If you have a page with 100 <audio> elements on it, you no longer have 300 threads lying around using up virtual address space!

Wednesday, 8 June 2011

Impressions of China 2011

I have just returned from traveling with my wife and her parents and sister for two weeks in China and Hong Kong.

It was an interesting experience. It's easy to see why so many people say that this century will be China's century.

My impressions of China are below. No doubt some people will disagree with them. Constructive comments welcome.
  • The Chinese plan long term. They're building infrastructure that they'll need in 20 years. The leadership doesn't need to worry about long term projects appearing to their electorate that they're not achieving results. They don't need to borrow money in order to pay for the overly generous election promises required to get them elected. This seems to me to be one of the primary strengths of the Chinese communist system, and one of the failings of democracy. I am not implying that either system is necessarily superior.

  • All housing is leasehold. When you buy a house, you buy a 30 year lease for a residential city title, and longer leases are available for rural and commercial building (IIRC). If the government wants the land to build a road or whatever, they take the land back, the road gets built, and you go somewhere else.
  • As a corollary of my previous point, the Chinese get things done. They don't go through rounds of resource consent spanning years when they want to build something. Some engineer draws a line on a map, and it happens.
  • Communism works in China. Gone are the bad old days of the revolution and the madness contained therein. The Chinese have embraced a form of capitalism and made it work with their system. Numerous Chinese ex-pats have told me that people who don't rock the boat can live pretty free and happy lives. The people who rock the boat may not...
  • The top echelons of Chinese Government are engineers, and it shows. They build physical things and encourage the manufacture of physical products. They don't wrangle over IP laws designed to plug the leaks in dying business models. They build stuff. Then they sell it, and everyone benefits.

  • Intellectual property is not respected. They plagiarise and copy blatantly. I suspect this is part of their recent rapid rise; they didn't have to invent or start from scratch the same way the west did when it developed, the Chinese just copied what the west had done. Once they've caught up across the board, it will be interesting to see how lax intellectual property law/enforcement affects their economy and how they do business in future
  • Everything is cheap. Food is cheap (and may be subject to government price controls now or in future to ensure it remains cheap). This means the base cost of living is low, so wages can also be low, and manufactured (exported) goods are cheap. The price of good quality food in China was easily one fifth of what you pay in New Zealand.
  • Perhaps as a corollary or my previous point, the quality of workmanship in China is in general very low, and they don't seem to put very much emphasis on stream-lining many of their processes (store check outs, even in department stores are slow; why make it hard for customers to give you their money?).
  • There are [fun] police everywhere (at least in the tourist traps and the popular areas I visited). Plenty of stern faced young men in uniform patrol the streets with often-used whistles to keep people off the grass, to keep bikes off designated areas, to keep people from sitting on walls, and in general to keep people in line. Though one of their main duties seems to be to giving directions to people.
  • They have electric bicycles where the battery charges while you pedal. They're everywhere, and very quiet - so they can sneak up on you. Seems a great and environmentally friendly way to get around flat cities.

  • The Chinese can be "a but rough around the edges". Spitting on the ground is common practice (and may be a consequence of the bad air, and the prevalence of smoking). If they were kiwi, I'd describe them as "unashamedly blokey".
  • The Chinese do things at scale. When we crossed from Hong Kong to Shenzhen by bus, we crossed a long bridge over Shenzhen bay. There were rafts supporting an oyster farm spanning Shenzhen bay there, which stretched as far as the eye could see in either direction (the air was hazy/polluted, which reduced visibility to about 3km or so, but still. Impressive.
  • The "Maorish Village" in the "Wonders of the World" theme park in Shenzhen was hilariously inaccurate. The "Maori" people were not Maori (probably of south-western Chinese descent), and they performed a ramshackle show which was a fusion of a dozen different Pacific cultures. They repeatedly shouted "Aloha" (which is Hawaiian, they should at least say "kia ora" for a Maori greeting), and danced around in Cook Island costumes, claiming it was Maori. As a Pākehā, I'm offended on behalf of my Maori countrymen.

  • You couldn't see the sun most of the time in the big cities due to the pollution. The air tastes vile.
  • Traffic in Shanghai borders on being civilized. Other parts of China, less so.
  • White people are treated well, but prone to being overcharged. If you're a struggling dancer, move to China! White people are in demand in this area.
  • Mandatory kit for all white people in China should be a t-shirt that says "No buy DVD. No buy T-Shirt. No buy Bag." Bonus points if it's written in Chinese.
  • There are plenty of white people in Shanghai. Elsewhere, less so.
  • Shanghai is cool. There's lots of sci-fi-esque buildings all over the place. Star fleet head quarters should be built there.

  • They never miss an opportunity to ruin a perfectly good event/tour/attraction by trying to sell you stuff. We went on a day-trip guided tour, and after lunch we were taken into a fish oil factory and subjected to a 30 minute power point presentation trying to scare us into buying their products. I could barely believe it was happening!
  • We took the MagLev train in Shanghai to the airport. It went 434km/h. It was seriously cool. We should totally get one of those.
  • My wife's family is involved with an English school in Chongqing, China. If you're interested in teaching English in China, let me know, they're hiring. They're looking to hire English-speaking white people, English doesn't need to be your native language. Yes, I know many people of other ethnicities with excellent English, but the locals feel it's more prestigious to learn English from white skinned people.

Thursday, 31 March 2011

HTML5 Video painting performance statistics in Firefox 5

I've landed video frame paint performance counters for HTML5 video onto mozilla-central. This should ship in Firefox 5, barring any disasters. This work was a combined effort by Chris Double and I. These are Mozilla specific fields which will only be available in Firefox.

The new statistics enable us to measure the performance of the video decoding and frame painting pipeline.

This adds the following fields to the HTMLVideoElement:
  • mozParsedFrames - A count of the number of video frames that have been demuxed/parsed from the media resource. If we were playing perfectly, we'd be able to paint this many frames.
  • mozDecodedFrames - A count of the number of deumxed/parsed video frames that have been decoded into Images. We skip decoding of parsed/demuxed frames if the decode is falling behind the playback position (this can happen if it takes a long time to decode a keyframe for example).
  • mozPresentedFrames - A count of the number of decoded frames that have been presented to the rendering pipeline for painting (set as the current Image on the video element's ImageContainer). We may not present decoded frames if the frame arrives for presentation late.
  • mozPaintedFrames - A count of the number of presented frames which were painted on screen. We may end up not painting presented frames if another frame is presented before the graphics pipeline has time to paint the previously presented frame, or if the video is off screen. 
  • mozFrameDelay - The time (as a floating point number in seconds) which the last painted video frame was rendered late by. This is the time duration between the decoder saying "paint frame X now", and the graphics pipeline physically getting frame X displayed on the screen. The value is accurate on desktop Firefox, but not on mobile. Improvements in the graphics pipeline, and the integration with the graphics pipeline, will show up as a decrease in this number.
Here's a demo of the video paint statistics in Firefox 5. You'll need a recent Firefox trunk nightly build for the demo to work.

Thursday, 17 February 2011

Firefox 4 video decoder architecture

To assist others coming up to speed on the architecture of the video decoder, I've put together a diagram of Firefox 4's video playback engine. We rewrote our video architecture for Firefox 4 in order to give us better control over the complete stack.

Click on the image for a larger diagram.

The key classes in our architecture are:
  • nsHTMLMediaElement - This manages the JavaScript/HTML accessible HTMLMediaElement interface, and implements the resource selection, load, and preload logic.
  • nsBuiltinDecoder - Manages a main thread accessible snapshot of the state of the underlying decoder. The decoders run on non-main threads, and we don't want to block the main thread to dig into the decoders when JS queries playback state, so we maintain a snapshot of the playback engine's state in this class. This inherits from nsMediaDecoder. You can also implement playback support for a new format by inheriting and implementing nsMediaDecoder. nsWaveDecoder is currently implemented this way, but we're in the process of reimplementing that as a sublcass of nsBuiltinDecoderReader.
  • nsBuiltinDecoderStateMachine - Manages the decode, state machine, audio-push threads, frame queueing, A/V sync, and buffering logic. This ensures that all the HTML5 events get dispatched at the appropriate time, and that behaviour is consistent and sane across different media types. Demuxing is handled abstractly by subclasses of nsBuiltinDecoderReader. This way all media types can share as much playback logic as possible, reducing our maintenance overhead.
  • nsOggReader/nsWebMReader - Demuxing and codec specific functionality is implemented by subclassing nsBuiltinDecoderReader. This reduces the amount of work required to implement and maintain support for new codecs. When a new codec is implemented as a nsBuiltinDecoderReader subclass, support for HTML events, buffering, and playback logic does not need to be reimplemented, since it already exists in nsBuiltinDecoderStateMachine. To add support for a new codec, it's easiest to implement support as a new nsBuiltinDecoderReader subclass.
  • nsAudioStream - Our cross platform audio API wrapper. It is based on libsydneyaudio, which operates on a push model rather than a (more commonly used) callback-based model, which has brought in a whole raft of headaches. Matthew Gregan is in the process of rewriting our audio layer to a more sane callback based model. We also provide a cross-process nsAudioStreamRemote, which proxies audio commands to an audio stream in another process. This is required on mobile.
  • ImageContainer - When it comes time to present a video frame, nsBuiltinDecoderStateMachine sets it as the "current image" of the video element's ImageContainer object. This then propagates through the Layers/2D scene rendering system, and it eventually gets rendered on the screen. The Layers compositing runs on the main thread, and ImageContainer provides a thread-safe wrapper. The images contained in the ImageContainer can be in OpenGL/D3D surfaces, so we can take advantage of hardware accelerated scaling, rendering, and YCbCr to RGB conversion.
  • nsVideoFrame - This resides in layout, and manages the dimensions/reflow of the video, as well as its poster image.
  • nsMediaStream - Our network code runs on the main thread, but the underlying libraries we use for media decoding (libvpx, libtheora, etc) assume synchronous reads. We can't afford to do blocking reads on the main thread, so we cache the media data downloaded into the nsMediaCache, and provide a thread-safe wrapper synchronous wrapper for reading in the nsMediaStream class. We use Necko for our networking, so we can take advantage of all the existing security and load-group functionality it implements.
The advantage of controlling the entire playback engine are many. We can easily control frame dropping, memory allocation, the threading model, what, when, and how we decode, and we can integrate more tightly with our network stack.