Monday, 30 August 2010

Steps to reduce random failures of media mochitests

Matthew Gregan and I have taken some steps to help reduce random orange in the media mochitests. Sayrer pointed out that during some random failures in test_play_events we were timing out with mmap() faliures and almost 130 threads running. It turns out that on 32bit Linux each thread stack is assigned 10MB of virtual address space, and when you're running a lot of threads you can run out of virtual address space, causing the timeouts.

We're probably particularly vulnerable to this since we're still not doing garbage collection in xpcshell during mochitest runs.

I spent some time ensuring we're more proactive at shutting down the media decoder's threads (each media element requires three threads, plus one more per media element on Linux used by the sound server), and I refactored our media tests to reduce the number of tests we run in parallel. This ensures our peak number of threads should be kept down, thus reducing peak memory usage/allocated virtual address space.

Concurrently to this, Matthew Gregan identified and fixed a thread safety issue in our media decoder and another in the underlying libsydneyaudio library on Linux, which we use for audio playback. We expect these two fixes to also reduce some random playback failures.

Since landing on these changes Thursday PDT, we've had a lot less random orange on mozilla-central. We're still getting some random failures, but most of these are on Linux, and many of those are caused by a known bug in the version of pulseaudio running on the Tinderbox mochitest machines. We expect many of these random failures will go away when the releng guys update pulseaudio on the Linux Tinderbox mochitest machines.

We're still getting a fair number of orange in test_progress, mostly on Windows, but we're going to be changing our media progress event to bring it into line with the current spec, and when we do that we'll be removing that test.

1 comment: