Friday 22 May 2009

Video seeking improvements

I've recently been working on improving seeking in the video element in Firefox. Two important bugs have been fixed: Speeding up seeking and removing artifacts while seeking. Combined these make seeking in the video element a vastly improved user experience!

Speeding up seeking - we use liboggplay, which in turn uses liboggz for seeking. Ogg doesn't have any kind of byte-offset to time index for the media it contains, so liboggz basically does something similar to a binary search over the media to implement seeking. This is fine for a file stored on your local disk, but for a file served over the internet we must do a new HTTP byte-range request for every bisection, which is slow. To speed up seeking, we now ask Roc's media cache what byte-ranges of the media are already downloaded, and we try to seek inside those regions first, before falling back to the slower seek over the entire resource. Seeking inside buffered ranges is practically instantaneous, and so seeking to parts of the video which are downloaded is now instantaneous.

Removing artifacts while seeking - when we seek to a time position, liboggplay returns the next video frame after that time position. But if the frame after the seek time position is an inter frame, which only encodes what's different from the previous frame, liboggplay returns a garbage frame. The problem is that when liboggplay decodes the inter frame, it doesn't apply the inter frame to the frame that was actually prior to the inter frame, it applies the inter frame to some other frame (maybe the key frame from the previously playing segment?). For example, if you seek Bruce Lee vs. Iron Man to 5 seconds, the result is something like this:


But that frame should actually look like this:


So the fix is conceptually simple; we need to seek to the key frame before the seek position, and then decode forward to the frame we're looking for. Ogg encodes its media data in pages, and each page contains a granulepos, which encodes the time of the key frame from which the frames in that page are based upon. So during the seek bisection, once we find the page containing the inter frame we want, we then know the time of the key frame we need to decode forward from. We can then seek again to that time to get the key frame and can then decode forward to the desired seek time without visual artifacts! There was one minor complication with a/v sync, but apart from that it works pretty well. There's still a bug somewhere, as sometimes we don't seek back to the keyframe correctly, but for the majority of cases it works perfectly, and is a vast improvement!

One issue with this approach is that if a video doesn't have regular key frames, we'll still seek back to the previous key frame and decode forward. If the key frame is several minutes back, performance can be pretty bad. The moral of the story, is that for good seeking performance, you want your encoder to inject regular key frames into your video!