Thundering Herd: Video seeking improvements

Friday, 22 May 2009

Video seeking improvements

I've recently been working on improving seeking in the video element in Firefox. Two important bugs have been fixed: Speeding up seeking and removing artifacts while seeking. Combined these make seeking in the video element a vastly improved user experience!

Speeding up seeking - we use liboggplay, which in turn uses liboggz for seeking. Ogg doesn't have any kind of byte-offset to time index for the media it contains, so liboggz basically does something similar to a binary search over the media to implement seeking. This is fine for a file stored on your local disk, but for a file served over the internet we must do a new HTTP byte-range request for every bisection, which is slow. To speed up seeking, we now ask Roc's media cache what byte-ranges of the media are already downloaded, and we try to seek inside those regions first, before falling back to the slower seek over the entire resource. Seeking inside buffered ranges is practically instantaneous, and so seeking to parts of the video which are downloaded is now instantaneous.

Removing artifacts while seeking - when we seek to a time position, liboggplay returns the next video frame after that time position. But if the frame after the seek time position is an inter frame, which only encodes what's different from the previous frame, liboggplay returns a garbage frame. The problem is that when liboggplay decodes the inter frame, it doesn't apply the inter frame to the frame that was actually prior to the inter frame, it applies the inter frame to some other frame (maybe the key frame from the previously playing segment?). For example, if you seek Bruce Lee vs. Iron Man to 5 seconds, the result is something like this:

But that frame should actually look like this:

So the fix is conceptually simple; we need to seek to the key frame before the seek position, and then decode forward to the frame we're looking for. Ogg encodes its media data in pages, and each page contains a granulepos, which encodes the time of the key frame from which the frames in that page are based upon. So during the seek bisection, once we find the page containing the inter frame we want, we then know the time of the key frame we need to decode forward from. We can then seek again to that time to get the key frame and can then decode forward to the desired seek time without visual artifacts! There was one minor complication with a/v sync, but apart from that it works pretty well. There's still a bug somewhere, as sometimes we don't seek back to the keyframe correctly, but for the majority of cases it works perfectly, and is a vast improvement!

One issue with this approach is that if a video doesn't have regular key frames, we'll still seek back to the previous key frame and decode forward. If the key frame is several minutes back, performance can be pretty bad. The moral of the story, is that for good seeking performance, you want your encoder to inject regular key frames into your video!

6 comments:

Colin Barrett said...: "The problem is that when liboggplay decodes the inter frame, it doesn't apply the inter frame to the frame that was actually prior to the inter frame, it applies the inter frame to some other frame (maybe the key frame from the previously playing segment?)."

That sounds like a bug in liboggplay, no?; 22 May 2009 at 16:21
Unknown said...: Colin: yes, it's arguably a bug in liboggplay. The 'more correct' behaviour would be to not output any frames until it sees the next keyframe.

However, the desired behaviour is to output from the seeked-to frame, which requires seeking back to the previous keyframe - so Chris's approach is what you want to do regardless of what liboggplay does if you seek to a non-keyframe.; 23 May 2009 at 05:15
Unknown said...: Would it be a hack to add some sort of "x-time-range" (with a "x-but-I-already-got-ranges" modifier) header such that the server could hand the client the pieces it needs to do the seek.

For flash video you can augment the video with the time code to byte position correspondence for each key frame (see http://www.buraks.com/flvmdi/). Are there really no such option for ogg? And are there no meta-data or comment fields that could be hijacked for the purpose?; 23 May 2009 at 07:01
Alfred said...: Fantastic! This makes video seeking really smooth!; 24 May 2009 at 19:30
Anonymous said...: Thanks for all the work you are doing. I’m looking to solve the HTML5 video adoption issue, and have come up with this: http://camendesign.com/code/files/video_for_everybody/test.html I would really like to open a discussion with Mozilla regarding this.; 27 May 2009 at 00:09
Gerv said...: Kroc: you could have made that a lot quicker to read by just posting the text on the web page :-)

I believe Wikipedia has some technology like this, which also has the option of the Cortado Java applet. You should check in with them.; 27 May 2009 at 01:09