Friday 18 September 2009

Ogg video seek performance improvements

I've recently landed bug 501031 on mozilla-central and on 1.9.2 which roughly cuts Ogg seeking time in half. In Firefox 3.5, seeking Ogg Internet video was very slow, often taking 20 seconds or more to seek. This patch will make seeking in Ogg media in Firefox 3.6 much faster! This patch should also reduce the likelihood of encountering visual artefacts after a seek.

Currently the Ogg format doesn't contain any kind of keyframe index, so when you want to seek to a given time you typically do a bisection search over the entire file, reading little chunks as you go to figure out where your bisection has landed. This works fine for files on disk, but when seeking in files served over the Internet this can be slow, especially when viewing media which is hosted half a world away.

To play Ogg files, Firefox uses liboggplay, which in turn uses liboggz to seek. Unfortunately liboggz's seek bisection is in need of some maintenance (it's currently being rewritten by the maintainer). Its bisection is erratic and it fails to terminate its search appropriately, so it makes many more bisections than required. Most bisections or non-sequential reads result in a new HTTP connection, which is what makes this process slow for Internet video.

My patch fixes liboggz's seek to bisect sensibly, and to end the bisection search when it lands within 500ms of the seek target. This means that once we land close enough to the seek target, we'll just keep downloading from there. This is typically faster than continuing the bisection search due to the latency in setting up HTTP requests. We subtract 500ms from the seek target before we begin the bisection search, so that we don't finish the seek after the actual seek target.

Theora stores its video data as keyframe and interframes. In order to seek to and display a frame at a given time, we need to seek to the previous keyframe and decode forward to the target frame. The suggested approach is to extract the keyframe's position from the frame's granulepos field, and then seek again to the keyframe. This second seek needs to be exactly right, and that's hard due to some nasty edge cases with regards to stream muxing. Doing another bisection search in our case is also slow due to the latency in setting up HTTP requests. So now we just calculate the maximum possible time-offset that a frame can be from its keyframe, and subtract that from our seek target. This means we will often download more data than necessary, but for us that's typically faster than doing another bisection search.

The moral of the story is that if you want your video to seek quickly, include regular keyframes!