Thundering Herd: A new unwitting ally in the fight against random orange

Friday 9 April 2010

A new unwitting ally in the fight against random orange

I have discovered a new tool in our toolbox in the fight against random test failures: crashinjectdll.dll on Windows, nptest.so on Linux (I don't know what it's called on MacOS...). I suspect we're randomly injecting crashes into plugins, in order to test recovery from plugin crashes with out-of-process plugins (correct me if I'm wrong). If a mochitest run times out on Tinderbox, and a crash has been injected, when the mochitest run is killed we'll get call stacks in the output log for all the threads that were running. This can give us some details of the state of Firefox when the timeout occurred, and may give us a clue as to why we've timed out.

Maybe we should make always inject a plugin crash at the start of a mochitest run, so that we always have call stacks when we kill a timed-out mochitest run?

Next time someone comments in your random orange bugs with a log, check to see if it's got call stacks in it!

1 comment:

Unknown said...: There are two different things here:

nptest exists on all platforms (it's the testplugin). Among its features are .crash() and .hang() methods which can be called by script which we use to test the behavior of OOPP crash and hang recovery.

crashinject is a Windows-specific DLL which the python testharness can use to inject a crash into a running process which appears to be hung, so that you can get a minidump from that process. On Linux you can just send SIGABRT and our crash reporter signal handler will catch it and create a minidump. I don't know if we have a way to generate an external hang on Mac or not.; 10 April 2010 at 02:15