Friday, 13 March 2009

Setting up VMware to record, replay and debug intermittent Mochitest failures

Edit 16 March 2010: This blog post is now out of date, and very likely wrong. The official documentation for setting up Replay Debugging for Firefox is now on the Mozilla Developer Center wiki: Record and Replay Debugging Firefox

Over the past few days I've been working to get VMware Workstation's Replay Debugging to work on Mozilla’s Mochitest suite. It's been a long process, but I've finally got something that records and can be replay-debugged! Replay debugging allows us to record everything that happens in a virtual machine, and then replay it back and step through the execution in a debugger. Often when we have an intermittent test failure, it's hard to reproduce (hence it's intermittent-ness). Now I can record a VM running Mochitests and if I record a test failure, I can replay the execution and step-through and see exactly what code paths where followed and hopefully figure out why. This is powerful, as it means we can deterministically and repeatedly reproduce an intermittent test failure in a debugger, making them a lot easier to debug.

Using replay debugging to debug intermittent test failures was originally Robert O'Callahan's idea. He had trouble setting this up on Linux, so he suggested I try it on Windows. It took a lot of messing around, but finally it works. The key lessons learned are:
  • Create the record-and-replay build on a network drive mapped to the same path on both your host and guest systems. This means that the debug symbols have the same path to source files embedded in them on both systems. Also paths compiled into the executable (e.g.: assertion __FILE__:__LINE__ messages) are valid on both systems.
  • When creating a recording of Firefox, start the recording before you start Firefox. I suspect that the replay-debugger must observe the DLLs being loaded at program startup in order to load debug symbols and thus allow the debugger to function.
  • The settings for replay debugging and for remote debugging are totally unrelated.
  • Project > Properties > Debugging > Command is the path to the executable which ran in the VM recording which the debugger will try to connect to when replaying.
  • Your build needs to be --enable-libxul --disable-static.
It took a lot of messing around, so for my own record, and for the use of anyone who also wants to set up recording and replay-debugging of a Mochitest run, the exact steps I went through are:
  1. Get a Windows machine with a supported CPU. Originally I had tried to set this up on a boot-camped Mac Mini, but that had a Core Duo processor, which is unsupported. My Vista laptop has a Core 2 Duo processor which is supported, so I've been working on that.
  2. Install Visual Studio Professional on your host system. Microsoft has a free 90 day Trial of Visual Studio 2008 Professional available for download. VMware recommend you use 2005 Professional, but I've successfully used both 2005 and 2008. You'll need the build prerequisites for this installed of course.
  3. Install VMware Workstation 6.5 on your host system. You must install this after Visual Studio, else its debugger plugin won't show up in Visual Studio.
  4. Install a Windows OS in your VM. This is the "guest" system. I installed Windows XP SP3.
  5. Install Visual Studio Professional on your guest system. You need this because it installs the Remote Debug Monitor. Visual Studio Express versions don't have this. Edit: Only required for remote debugging, not replay.
  6. In your guest Windows OS, disable Windows Firewall. You can do this by running "firewall.cpl" at the command prompt. Edit: Only required for remote debugging, not replay.
  7. In your guest Windows OS, set the security policy for "Network access: Sharing and security model for local accounts" to be "Classic - local users authenticate as themselves". You can access this from Control Panel > Administrative Tools > Local Security Policy > Local Policies. This setting allows the remote debugger to log into the VM system. Edit: Only required for remote debugging, not replay.
  8. Create a network drive, and map it in both your host and guest system to the same path. This will store the builds you test, and ensure that the builds have symbols which have valid paths for both the host and guest machines. I created a new drive Z: on my host system. It was stored on an external hard disk, as my laptop's always running very low on space. You'll need lots of disk space.
  9. It's a good idea to create a VM snapshot after setting up everything, so that these settings can't be lost. Every time you replay, the state of the VM is reset to the start of the recording. The state is also reset to the "initial snapshot" if you try to create a recording from inside Visual Studio. The state is saved if you shutdown the VM normally. This can wipe settings if you're not careful.
  10. Check out the appropriate Mozilla source tree to the network drive.
  11. Build your tree on the network drive. Ensure your build is an enable-libxul disable-static build, i.e. add to your .mozconfig: "ac_add_options --enable-libxul --disable-static". Without this I found that some the symbols for some DLLs weren't loaded (gklayout.gll in particular), so I couldn't set breakpoints where I wanted. I found building on a network drive took about 2.5 times longer than a normal build.
  12. Create a new project in Visual Studio. You can't just create a project by opening an EXE file, the VMware menu is greyed out if you do this. You must create a new project file using the File > New > Project > Win32 > Win32 Console Application. I opted to create an empty project, and that works fine for our purposes.
  13. Configure Project > Properties > Debugging and enter the Command as the path to firefox.exe on your network drive.
  14. Boot up your guest operating system in your virtual machine. Start a new recording in your VM. We're going to create a recording from inside the VM, rather than initiating the recording from Visual Studio. This is important, because we can't (at least not easily) launch a Mochitest run in an MSYS shell in the guest operating system from inside the Visual Studio debugger. It's much simpler to just record the virtual machine while it's doing a Mochitest run. You must start the recording before firefox.exe starts up however, else the debugger may not connect to it when you replay.
  15. In the guest operating system, run Mochitests until you reproduce a failure, timeout etc. Stop the recording.
  16. In Visual Studio on the host system, configure VMWare. Open menu item VMWare > Options > Replay Debugging in VM. Set "Virtual Machine" to point to your VMX file for your guest operating system. Set "Recording to Replay" to the name of the recording you just recorded.
  17. In Visual Studio on the host system, open the source files you want to put break points in from your network drive. Set breakpoints in them.
  18. Press the "Debug an application running inside of a recording" button on the toolbar, or VMware > Start Replay Debugging.
  19. The VM will start replaying the recording. It will be slow, and will take a few minutes to start up, but assuming you're configured correctly, it should replay, and execution should break on your break points. If the recording fails to start, check for error messages in the VMware output window in Visual Studio.
That's it! All the black magic required should be outlined above. Now, to fix some intermittent test failures...