Ever since my last escapades with Replay Debugging in VMWare Workstation 6.5, I've been looking forward to improvements in this awesome technology. Thankfully the guys at VMWare have been hard at work, and now VMWare Workstation 7 now boasts improved Replay Debugging. I've found it much more robust and reliable, and Roc and I have already used it to debug some random orange bugs.
I've documented how to produce a Replay Debugging setup for debugging intermittent test failures in Mozilla mochitests, and put it up on MDC:
https://developer.mozilla.org/En/Debugging/Record_and_Replay_Debugging_Firefox
Now anyone can setup a machine to record and replay debug intermittent mochitests! A word of warning: you need a modern CPU in order to get good performance. I had poor performance when running on my two-year-old Core2Duo laptop, but replay performance is almost at real-time speeds on my shiny new Intel i7 950 box.
I still have two patches that need to be refined and then checked in, to facilitate replay debugging. The first enables the mochitest harness to loop forever on a test directory. The second enables you to set break points on specific JavaScript dump() calls, so you can break during replay close to where the action is.
We're far from having a fully automated record and replay setup, but we've made a start!
Friday, 20 November 2009
Monday, 5 October 2009
Problems building Firefox with Visual Studio 2005 and Windows 7 SDK
When I tried to update my 1.9.2 tree this morning on my Windows box I quickly discovered that Ben Hearsum had recently updated our build system to require the Windows 7 SDK. However once I installed the new SDK and tried to build (using Visual Studio 2005's compiler), I got the following error:
shell32.lib(shguid.obj) : fatal error LNK1103: debugging information corrupt; recompile module.
This is a compatibility issue between Visual Studio 2008 and Visual Studio 2005 which only occurs when you compile in debug mode. There's a hotifx to resolve this issue. Once I installed this, I could magically build again. Thanks to biesi for pointing out the hotfix!
shell32.lib(shguid.obj) : fatal error LNK1103: debugging information corrupt; recompile module.
This is a compatibility issue between Visual Studio 2008 and Visual Studio 2005 which only occurs when you compile in debug mode. There's a hotifx to resolve this issue. Once I installed this, I could magically build again. Thanks to biesi for pointing out the hotfix!
Friday, 18 September 2009
Ogg video seek performance improvements
I've recently landed bug 501031 on mozilla-central and on 1.9.2 which roughly cuts Ogg seeking time in half. In Firefox 3.5, seeking Ogg Internet video was very slow, often taking 20 seconds or more to seek. This patch will make seeking in Ogg media in Firefox 3.6 much faster! This patch should also reduce the likelihood of encountering visual artefacts after a seek.
Currently the Ogg format doesn't contain any kind of keyframe index, so when you want to seek to a given time you typically do a bisection search over the entire file, reading little chunks as you go to figure out where your bisection has landed. This works fine for files on disk, but when seeking in files served over the Internet this can be slow, especially when viewing media which is hosted half a world away.
To play Ogg files, Firefox uses liboggplay, which in turn uses liboggz to seek. Unfortunately liboggz's seek bisection is in need of some maintenance (it's currently being rewritten by the maintainer). Its bisection is erratic and it fails to terminate its search appropriately, so it makes many more bisections than required. Most bisections or non-sequential reads result in a new HTTP connection, which is what makes this process slow for Internet video.
My patch fixes liboggz's seek to bisect sensibly, and to end the bisection search when it lands within 500ms of the seek target. This means that once we land close enough to the seek target, we'll just keep downloading from there. This is typically faster than continuing the bisection search due to the latency in setting up HTTP requests. We subtract 500ms from the seek target before we begin the bisection search, so that we don't finish the seek after the actual seek target.
Theora stores its video data as keyframe and interframes. In order to seek to and display a frame at a given time, we need to seek to the previous keyframe and decode forward to the target frame. The suggested approach is to extract the keyframe's position from the frame's granulepos field, and then seek again to the keyframe. This second seek needs to be exactly right, and that's hard due to some nasty edge cases with regards to stream muxing. Doing another bisection search in our case is also slow due to the latency in setting up HTTP requests. So now we just calculate the maximum possible time-offset that a frame can be from its keyframe, and subtract that from our seek target. This means we will often download more data than necessary, but for us that's typically faster than doing another bisection search.
The moral of the story is that if you want your video to seek quickly, include regular keyframes!
Currently the Ogg format doesn't contain any kind of keyframe index, so when you want to seek to a given time you typically do a bisection search over the entire file, reading little chunks as you go to figure out where your bisection has landed. This works fine for files on disk, but when seeking in files served over the Internet this can be slow, especially when viewing media which is hosted half a world away.
To play Ogg files, Firefox uses liboggplay, which in turn uses liboggz to seek. Unfortunately liboggz's seek bisection is in need of some maintenance (it's currently being rewritten by the maintainer). Its bisection is erratic and it fails to terminate its search appropriately, so it makes many more bisections than required. Most bisections or non-sequential reads result in a new HTTP connection, which is what makes this process slow for Internet video.
My patch fixes liboggz's seek to bisect sensibly, and to end the bisection search when it lands within 500ms of the seek target. This means that once we land close enough to the seek target, we'll just keep downloading from there. This is typically faster than continuing the bisection search due to the latency in setting up HTTP requests. We subtract 500ms from the seek target before we begin the bisection search, so that we don't finish the seek after the actual seek target.
Theora stores its video data as keyframe and interframes. In order to seek to and display a frame at a given time, we need to seek to the previous keyframe and decode forward to the target frame. The suggested approach is to extract the keyframe's position from the frame's granulepos field, and then seek again to the keyframe. This second seek needs to be exactly right, and that's hard due to some nasty edge cases with regards to stream muxing. Doing another bisection search in our case is also slow due to the latency in setting up HTTP requests. So now we just calculate the maximum possible time-offset that a frame can be from its keyframe, and subtract that from our seek target. This means we will often download more data than necessary, but for us that's typically faster than doing another bisection search.
The moral of the story is that if you want your video to seek quickly, include regular keyframes!
Wednesday, 5 August 2009
Configuring web servers for HTML5 Ogg video and audio
When serving HTML5 Ogg <video> or <audio> from your web server, there's a number of things you can do to make videos load faster. This post outlines how to configure your web server to improve HTML5 video and audio playback performance.
1. Serve X-Content-Duration headers
The Ogg format doesn't encapsulate the duration of the media. So for the progress bar on the video controls to display the duration of the video, we need to somehow determine the duration. You can either support HTTP1.1 Byte-Range requests (see 2. below) or better yet, serve an X-Content-Duration header for your Ogg videos. This provides the duration of the video in seconds (not HH:MM:SS format), as a floating point value. For example, a video which is 1 minute and 32.6 seconds would you'd serve the extra header: "X-Content-Duration: 92.6".
When an Firefox requests an Ogg media, if you should serve up the X-Content-Duration header with the duration of the media. This means Firefox doesn't need to do any extra HTTP requests to seek to the end of the file to calculate the duration so it can display the progress bar.
You can get the duration using oggz-info, which comes with oggz-tools. oggz-info gives output like this:
Note that you can't just serve up the Content-Duration line that oggz-info outputs, it's in HH:MM:SS.ss format, you need to convert it to seconds only, and serve it as X-Content-Duration.
Be warned, it looks like oggz-info makes one read pass of the media in order to calculate the duration, so it would be wise to store the duration value, and not to calculate it for every HTTP request of every Ogg video.
Also be aware that oggz-info does not calculate the duration of videos that start at a non-zero time correctly. oggz-info reports the duration as the time of the last frame, not the time of the last frame, minus the time of the first frame. Edit - 6 Aug 2009: Looks like this was only true for old versions of oggz-info, current versions use the presentation time from the skeleton track to calculate the duration correctly.
2. Handle HTTP1.1 byte range requests correctly
In order to seek to and play back regions of the media which aren't yet downloaded, Firefox uses HTTP1.1 Byte Range requests to retrieve the media from the seeek target position. Also if you don't serve X-Content-Duration, we use byte-range requests to seek to the end of the media (provided you're serving Content-Length) to determine the duration of the media.
Your server should serve the "Accept-Ranges: bytes" HTTP header if it can accept byte-range requests. It must return "206: Partial content" to all byte range requests, else Firefox can't be sure you actually support byte range requests. Remember you must return "206: Partial Content" for requests for "Range: bytes=0-" as well.
If you're curious, see bug 502894 comment 1 for more details of the HTTP requests Firefox can make and why.
3. Include regular key frames
When we seek, we have to seek to the keyframe before the seek target, and then download and decode from there until we reach the actual target time. The further your keyframes are apart, the longer this takes, so include regular keyframes. ffmpeg2theora's default of one keyframe every 64 frames (or about every 2 seconds) seems to work ok, but be aware that the more keyframes you have, the larger your video file will be, so your mileage may vary.
4. Serve the correct mime-type
For *.ogg and *.ogv files containing video (possibly with an audio track as well), serve the video/ogg mime type. For *.oga and *.ogg files which contain only audio, serve the audio/ogg. For *.ogg files with unknown contents, you can serve application/ogg, and we'll treat it as a video file. Most servers don't yet serve the correct mime-type for *.ogv and *.oga files.
5. Consider using autobuffer
If you have the autobuffer attribute set to true for your video or audio element, Firefox will attempt to download the entire media when the page loads. Otherwise, Firefox only downloads enough of the media to display the first video frame, and to determine the duration. autobuffer is off by default, so for a YouTube style video hosting site, your users may appreciate you setting autobuffer="true" for some video elements.
1. Serve X-Content-Duration headers
The Ogg format doesn't encapsulate the duration of the media. So for the progress bar on the video controls to display the duration of the video, we need to somehow determine the duration. You can either support HTTP1.1 Byte-Range requests (see 2. below) or better yet, serve an X-Content-Duration header for your Ogg videos. This provides the duration of the video in seconds (not HH:MM:SS format), as a floating point value. For example, a video which is 1 minute and 32.6 seconds would you'd serve the extra header: "X-Content-Duration: 92.6".
When an Firefox requests an Ogg media, if you should serve up the X-Content-Duration header with the duration of the media. This means Firefox doesn't need to do any extra HTTP requests to seek to the end of the file to calculate the duration so it can display the progress bar.
You can get the duration using oggz-info, which comes with oggz-tools. oggz-info gives output like this:
$ oggz-info /g/media/bruce_vs_ironman.ogv
Content-Duration: 00:01:00.046
Skeleton: serialno 1976223438
4 packets in 3 pages, 1.3 packets/page, 27.508% Ogg overhead
Presentation-Time: 0.000
Basetime: 0.000
Theora: serialno 0170995062
1790 packets in 1068 pages, 1.7 packets/page, 1.049% Ogg overhead
Video-Framerate: 29.983 fps
Video-Width: 640
Video-Height: 360
Vorbis: serialno 0708996688
4531 packets in 167 pages, 27.1 packets/page, 1.408% Ogg overhead
Audio-Samplerate: 44100 Hz
Audio-Channels: 2
Note that you can't just serve up the Content-Duration line that oggz-info outputs, it's in HH:MM:SS.ss format, you need to convert it to seconds only, and serve it as X-Content-Duration.
Be warned, it looks like oggz-info makes one read pass of the media in order to calculate the duration, so it would be wise to store the duration value, and not to calculate it for every HTTP request of every Ogg video.
2. Handle HTTP1.1 byte range requests correctly
In order to seek to and play back regions of the media which aren't yet downloaded, Firefox uses HTTP1.1 Byte Range requests to retrieve the media from the seeek target position. Also if you don't serve X-Content-Duration, we use byte-range requests to seek to the end of the media (provided you're serving Content-Length) to determine the duration of the media.
Your server should serve the "Accept-Ranges: bytes" HTTP header if it can accept byte-range requests. It must return "206: Partial content" to all byte range requests, else Firefox can't be sure you actually support byte range requests. Remember you must return "206: Partial Content" for requests for "Range: bytes=0-" as well.
If you're curious, see bug 502894 comment 1 for more details of the HTTP requests Firefox can make and why.
3. Include regular key frames
When we seek, we have to seek to the keyframe before the seek target, and then download and decode from there until we reach the actual target time. The further your keyframes are apart, the longer this takes, so include regular keyframes. ffmpeg2theora's default of one keyframe every 64 frames (or about every 2 seconds) seems to work ok, but be aware that the more keyframes you have, the larger your video file will be, so your mileage may vary.
4. Serve the correct mime-type
For *.ogg and *.ogv files containing video (possibly with an audio track as well), serve the video/ogg mime type. For *.oga and *.ogg files which contain only audio, serve the audio/ogg. For *.ogg files with unknown contents, you can serve application/ogg, and we'll treat it as a video file. Most servers don't yet serve the correct mime-type for *.ogv and *.oga files.
5. Consider using autobuffer
If you have the autobuffer attribute set to true for your video or audio element, Firefox will attempt to download the entire media when the page loads. Otherwise, Firefox only downloads enough of the media to display the first video frame, and to determine the duration. autobuffer is off by default, so for a YouTube style video hosting site, your users may appreciate you setting autobuffer="true" for some video elements.
Friday, 22 May 2009
Video seeking improvements
I've recently been working on improving seeking in the video element in Firefox. Two important bugs have been fixed: Speeding up seeking and removing artifacts while seeking. Combined these make seeking in the video element a vastly improved user experience!
Speeding up seeking - we use liboggplay, which in turn uses liboggz for seeking. Ogg doesn't have any kind of byte-offset to time index for the media it contains, so liboggz basically does something similar to a binary search over the media to implement seeking. This is fine for a file stored on your local disk, but for a file served over the internet we must do a new HTTP byte-range request for every bisection, which is slow. To speed up seeking, we now ask Roc's media cache what byte-ranges of the media are already downloaded, and we try to seek inside those regions first, before falling back to the slower seek over the entire resource. Seeking inside buffered ranges is practically instantaneous, and so seeking to parts of the video which are downloaded is now instantaneous.
Removing artifacts while seeking - when we seek to a time position, liboggplay returns the next video frame after that time position. But if the frame after the seek time position is an inter frame, which only encodes what's different from the previous frame, liboggplay returns a garbage frame. The problem is that when liboggplay decodes the inter frame, it doesn't apply the inter frame to the frame that was actually prior to the inter frame, it applies the inter frame to some other frame (maybe the key frame from the previously playing segment?). For example, if you seek Bruce Lee vs. Iron Man to 5 seconds, the result is something like this:
But that frame should actually look like this:
So the fix is conceptually simple; we need to seek to the key frame before the seek position, and then decode forward to the frame we're looking for. Ogg encodes its media data in pages, and each page contains a granulepos, which encodes the time of the key frame from which the frames in that page are based upon. So during the seek bisection, once we find the page containing the inter frame we want, we then know the time of the key frame we need to decode forward from. We can then seek again to that time to get the key frame and can then decode forward to the desired seek time without visual artifacts! There was one minor complication with a/v sync, but apart from that it works pretty well. There's still a bug somewhere, as sometimes we don't seek back to the keyframe correctly, but for the majority of cases it works perfectly, and is a vast improvement!
One issue with this approach is that if a video doesn't have regular key frames, we'll still seek back to the previous key frame and decode forward. If the key frame is several minutes back, performance can be pretty bad. The moral of the story, is that for good seeking performance, you want your encoder to inject regular key frames into your video!
Speeding up seeking - we use liboggplay, which in turn uses liboggz for seeking. Ogg doesn't have any kind of byte-offset to time index for the media it contains, so liboggz basically does something similar to a binary search over the media to implement seeking. This is fine for a file stored on your local disk, but for a file served over the internet we must do a new HTTP byte-range request for every bisection, which is slow. To speed up seeking, we now ask Roc's media cache what byte-ranges of the media are already downloaded, and we try to seek inside those regions first, before falling back to the slower seek over the entire resource. Seeking inside buffered ranges is practically instantaneous, and so seeking to parts of the video which are downloaded is now instantaneous.
Removing artifacts while seeking - when we seek to a time position, liboggplay returns the next video frame after that time position. But if the frame after the seek time position is an inter frame, which only encodes what's different from the previous frame, liboggplay returns a garbage frame. The problem is that when liboggplay decodes the inter frame, it doesn't apply the inter frame to the frame that was actually prior to the inter frame, it applies the inter frame to some other frame (maybe the key frame from the previously playing segment?). For example, if you seek Bruce Lee vs. Iron Man to 5 seconds, the result is something like this:
But that frame should actually look like this:
So the fix is conceptually simple; we need to seek to the key frame before the seek position, and then decode forward to the frame we're looking for. Ogg encodes its media data in pages, and each page contains a granulepos, which encodes the time of the key frame from which the frames in that page are based upon. So during the seek bisection, once we find the page containing the inter frame we want, we then know the time of the key frame we need to decode forward from. We can then seek again to that time to get the key frame and can then decode forward to the desired seek time without visual artifacts! There was one minor complication with a/v sync, but apart from that it works pretty well. There's still a bug somewhere, as sometimes we don't seek back to the keyframe correctly, but for the majority of cases it works perfectly, and is a vast improvement!
One issue with this approach is that if a video doesn't have regular key frames, we'll still seek back to the previous key frame and decode forward. If the key frame is several minutes back, performance can be pretty bad. The moral of the story, is that for good seeking performance, you want your encoder to inject regular key frames into your video!
Friday, 13 March 2009
Setting up VMware to record, replay and debug intermittent Mochitest failures
Edit 16 March 2010: This blog post is now out of date, and very likely wrong. The official documentation for setting up Replay Debugging for Firefox is now on the Mozilla Developer Center wiki: Record and Replay Debugging Firefox
Over the past few days I've been working to get VMware Workstation's Replay Debugging to work on Mozilla’s Mochitest suite. It's been a long process, but I've finally got something that records and can be replay-debugged! Replay debugging allows us to record everything that happens in a virtual machine, and then replay it back and step through the execution in a debugger. Often when we have an intermittent test failure, it's hard to reproduce (hence it's intermittent-ness). Now I can record a VM running Mochitests and if I record a test failure, I can replay the execution and step-through and see exactly what code paths where followed and hopefully figure out why. This is powerful, as it means we can deterministically and repeatedly reproduce an intermittent test failure in a debugger, making them a lot easier to debug.
Using replay debugging to debug intermittent test failures was originally Robert O'Callahan's idea. He had trouble setting this up on Linux, so he suggested I try it on Windows. It took a lot of messing around, but finally it works. The key lessons learned are:
Over the past few days I've been working to get VMware Workstation's Replay Debugging to work on Mozilla’s Mochitest suite. It's been a long process, but I've finally got something that records and can be replay-debugged! Replay debugging allows us to record everything that happens in a virtual machine, and then replay it back and step through the execution in a debugger. Often when we have an intermittent test failure, it's hard to reproduce (hence it's intermittent-ness). Now I can record a VM running Mochitests and if I record a test failure, I can replay the execution and step-through and see exactly what code paths where followed and hopefully figure out why. This is powerful, as it means we can deterministically and repeatedly reproduce an intermittent test failure in a debugger, making them a lot easier to debug.
Using replay debugging to debug intermittent test failures was originally Robert O'Callahan's idea. He had trouble setting this up on Linux, so he suggested I try it on Windows. It took a lot of messing around, but finally it works. The key lessons learned are:
- Create the record-and-replay build on a network drive mapped to the same path on both your host and guest systems. This means that the debug symbols have the same path to source files embedded in them on both systems. Also paths compiled into the executable (e.g.: assertion __FILE__:__LINE__ messages) are valid on both systems.
- When creating a recording of Firefox, start the recording before you start Firefox. I suspect that the replay-debugger must observe the DLLs being loaded at program startup in order to load debug symbols and thus allow the debugger to function.
- The settings for replay debugging and for remote debugging are totally unrelated.
- Project > Properties > Debugging > Command is the path to the executable which ran in the VM recording which the debugger will try to connect to when replaying.
- Your build needs to be --enable-libxul --disable-static.
- Get a Windows machine with a supported CPU. Originally I had tried to set this up on a boot-camped Mac Mini, but that had a Core Duo processor, which is unsupported. My Vista laptop has a Core 2 Duo processor which is supported, so I've been working on that.
- Install Visual Studio Professional on your host system. Microsoft has a free 90 day Trial of Visual Studio 2008 Professional available for download.
VMware recommend you use 2005 Professional, butI've successfully used both 2005 and 2008. You'll need the build prerequisites for this installed of course. - Install VMware Workstation 6.5 on your host system. You must install this after Visual Studio, else its debugger plugin won't show up in Visual Studio.
- Install a Windows OS in your VM. This is the "guest" system. I installed Windows XP SP3.
Install Visual Studio Professional on your guest system. You need this because it installs the Remote Debug Monitor. Visual Studio Express versions don't have this.Edit: Only required for remote debugging, not replay.In your guest Windows OS, disable Windows Firewall. You can do this by running "firewall.cpl" at the command prompt.Edit: Only required for remote debugging, not replay.In your guest Windows OS, set the security policy for "Network access: Sharing and security model for local accounts" to be "Classic - local users authenticate as themselves". You can access this from Control Panel > Administrative Tools > Local Security Policy > Local Policies. This setting allows the remote debugger to log into the VM system.Edit: Only required for remote debugging, not replay.- Create a network drive, and map it in both your host and guest system to the same path. This will store the builds you test, and ensure that the builds have symbols which have valid paths for both the host and guest machines. I created a new drive Z: on my host system. It was stored on an external hard disk, as my laptop's always running very low on space. You'll need lots of disk space.
- It's a good idea to create a VM snapshot after setting up everything, so that these settings can't be lost. Every time you replay, the state of the VM is reset to the start of the recording. The state is also reset to the "initial snapshot" if you try to create a recording from inside Visual Studio. The state is saved if you shutdown the VM normally. This can wipe settings if you're not careful.
- Check out the appropriate Mozilla source tree to the network drive.
- Build your tree on the network drive. Ensure your build is an enable-libxul disable-static build, i.e. add to your .mozconfig: "ac_add_options --enable-libxul --disable-static". Without this I found that some the symbols for some DLLs weren't loaded (gklayout.gll in particular), so I couldn't set breakpoints where I wanted. I found building on a network drive took about 2.5 times longer than a normal build.
- Create a new project in Visual Studio. You can't just create a project by opening an EXE file, the VMware menu is greyed out if you do this. You must create a new project file using the File > New > Project > Win32 > Win32 Console Application. I opted to create an empty project, and that works fine for our purposes.
- Configure Project > Properties > Debugging and enter the Command as the path to firefox.exe on your network drive.
- Boot up your guest operating system in your virtual machine. Start a new recording in your VM. We're going to create a recording from inside the VM, rather than initiating the recording from Visual Studio. This is important, because we can't (at least not easily) launch a Mochitest run in an MSYS shell in the guest operating system from inside the Visual Studio debugger. It's much simpler to just record the virtual machine while it's doing a Mochitest run. You must start the recording before firefox.exe starts up however, else the debugger may not connect to it when you replay.
- In the guest operating system, run Mochitests until you reproduce a failure, timeout etc. Stop the recording.
- In Visual Studio on the host system, configure VMWare. Open menu item VMWare > Options > Replay Debugging in VM. Set "Virtual Machine" to point to your VMX file for your guest operating system. Set "Recording to Replay" to the name of the recording you just recorded.
- In Visual Studio on the host system, open the source files you want to put break points in from your network drive. Set breakpoints in them.
- Press the "Debug an application running inside of a recording" button on the toolbar, or VMware > Start Replay Debugging.
- The VM will start replaying the recording. It will be slow, and will take a few minutes to start up, but assuming you're configured correctly, it should replay, and execution should break on your break points. If the recording fails to start, check for error messages in the VMware output window in Visual Studio.