Thundering Herd: Firefox's Gecko Media Plugin & EME Architecture

For rendering audio and video Firefox typically uses either the operating system's audio/video codecs or bundled software codec libraries, but for DRM video playback (like Netflix, Amazon Prime Video, and the like) and WebRTC video calls using baseline H.264 video, Firefox relies on Gecko Media Plugins, or GMPs for short.

This blog post describes the architecture of the Gecko Media Plugin system in Firefox, and the major class/objects involved, as it looked in June 2019.

For DRM video Firefox relies upon Google's Widevine Content Decryption Module, a dynamic shared library downloaded at runtime. Although this plugin doesn't conform to the GMP ABI, we provide an adapter to allow it to be run through the GMP system. We use the same Widevine CDM plugin that Chrome uses.

For decode and encode of H.264 streams for WebRTC, Firefox uses OpenH264, which is provided by Cisco. This plugin implements the GMP ABI.

These two plugins are downloaded at runtime from Google's and Cisco's servers, and installed in the user's Firefox profile directory.

We also ship a ClearKey CDM, which is the baseline decryption scheme required by the Encrypted Media Extensions specification. This mimics interface which the Widevine CDM implements, and is used in our EME regression tests. It's bundled with the rest of Firefox, and lives in the Firefox install directory.

The objects involved in running GMPs are spread over three processes; the main (AKA parent) process, the sandboxed content process where we run JavaScript and load web pages, and the sandboxed GMP process, which only runs GMPs.

You can view a Diagram of Firefox's Gecko Media Plugin online, or download a PDF version of Firefox's Gecko Media Plugin architecture.

The main facade to the GMP system is the GeckoMediaPluginService. Clients use the GeckoMediaPluginService to instantiate IPDL actors connecting their client to the GMP process, and to configure the service. In general, most operations which involve IPC to the GMPs/CDMs should happen on the GMP thread, as the GMP related protocols are processed on that thread.

mozIGeckoMediaPluginService can be used on the main thread by JavaScript, but the main-thread accessible methods proxy their work to the GMP thread.

How GMPs are downloaded and installed

The Firefox front end code which manages GMPs is the GMPProvider. This is a JavaScript object, running in the front end code in the main process. On startup if any existing GMPs are already downloaded and installed, this calls mozIGeckoMediaPluginService.addPluginDir() with the path to the GMP's location on disk. Gecko's C++ code then knows about the GMP. The GeckoMediaPluginService then parses the metadata file in that GMP's directory, and creates and stores a GMPParent for that plugin. At this stage the GMPParent is like a template, which stores the metadata describing how to start a plugin of this type. When we come to instantiate a plugin, we'll clone the template GMPParent into a new instance, and load a child process to run the plugin using the cloned GMPParent.

Shortly after the browser starts up (usually within 60 seconds), the GMPProvider will decide whether it should check for new GMP updates. The GMPProvider will check for updates if either it has not checked in the past 24 hours, or if the browser has been updated since last time it checked. If the GMPProvider decides to check for updates, it will poll Mozilla's Addons Update Server. This will return an update.xml file which lists the current GMPs for that particular Firefox version/platform, and the URLs from which to download those plugins. The plugins are hosted by third parties (Cisco and Google), not on Mozilla's servers. Mozilla only hosts the manifest describing where to download them from.

If the GMPs in the update.xml file are different to what is installed, Firefox will update its GMPs to match the update.xml file from AUS. Firefox will download and verify the new GMP, uninstall the old GMP, install the new GMP, and then add the new GMP's path to the mozIGeckoMediaPluginService. The objects that do this are the GMPDownloader and the GMPInstallManager, which are JavaScript modules in the front end code as well.

Note Firefox will take action to ensure its installed GMPs matches whatever is specified in the update.xml file. So if a version of a GMP which is older than what is installed is specified in the update.xml file, Firefox will uninstall the newer version, and download and install the older version. This is to allow a GMP update to be rolled back if a problem is detected with the newer GMP version.

If the AUS server can't be contacted, and no GMPs are installed, Firefox has the URLs of GMPs baked in, and will use those URLs to download the GMPs.

On startup, the GMPProvider also calls mozIGeckoMediaPluginService.addPluginDir() for the ClearKey CDM, passing in its path in the Firefox install directory.

How EME plugins are started in Firefox

The lifecycle for Widevine and ClearKey CDM begins in the content process with content JavaScript calling Navigator.requestMediaKeySystemAccess(). Script passes in a set of MediaKeySystemConfig, and these are passed forward to the MediaKeySystemAccessManager. The MediaKeySystemAccessManager figures out a supported configuration, and if it finds one, returns a MediaKeySystemAccess from which content JavaScript can instantiate a MediaKeys object.

Once script calls MediaKeySystemAccess.createMediaKeys(), we begin the process of instantiating the plugin. We create a MediaKeys object and a ChromiumCDMProxy object, and call Init() on the proxy. The initialization is asynchronous, so we return a promise to content JavaScript and on success we'll resolve the promise with the MediaKeys instance which can talk to the CDM in the GMP process.

To create a new CDM, ChromiumCDMProxy::Init() calls GeckoMediaPluginService::GetCDM(). This runs in the content process, but since the content process is sandboxed, we can't create a new child process to run the CDM there and then. As we're in the content process, the GeckoMediaPluginService instance we're talking to is a GeckoMediaPluginServiceChild. This calls over to the parent process to retrieve a GMPContentParent bridge. GMPContentParent acts like the GMPParent in the content process. GeckoMediaPluginServiceChild::GetContentParent() retrieves the bridge, and sends a LaunchGMPForNodeId() message to instantiate the plugin in the parent process.

In the non multi-process Firefox case, we still call GeckoMediaPluginService::GetContentParent(), but we end up running GeckoMediaPluginServiceParent::GetContentParent(), which can just instantiate the plugin directly.

When the parent process receives a LaunchGMPForNodeId() message, the GMPServiceParent runs through its list of GMPParents to see if there's one matching the parameters passed over. We check to see if there's an instance from the same NodeId, and if so use that. The NodeId is a hash of the origin requesting the plugin, combined with the top level browsing origin, plus salt. This ensures GMPs from different origins always end up running in different processes, and GMPs running in the same origin run in the same process.

If we don't find an active GMPParent running the requested NodeId, we'll make a copy of a GMPParent matching the parameters, and call LoadProcess() on the new instance. This creates a GMPProcessParent object, which in turn uses GeckoChildProcessHost to run a command line to start the child GMP process. The command line passed to the newly spawned child process causes the GMPProcessChild to run, which creates and initializes the GMPChild, setting up the IPC connection between GMP and Main processes.

The GMPChild delegates most of the business of loading the GMP to the GMPLoader. The GMPLoader opens the plugin library from disk, and starts the Sandbox using the SandboxStarter, which has a different implementation for every platform. Once the sandbox is started, the GMPLoader uses a GMPAdapter parameter to adapt whatever binary interface the plugin exports (the Widevine C API for example) to the match the GMP API. We use the adapter to call into the plugin to instantiate an instance of the CDM. For OpenH264 we simply use a PassThroughAdapter, since the plugin implements the GMP API.

If all that succeeded, we'll send a message reporting success to the parent process, which in turn reports success to the content process, which resolves the JavaScript promise returned by MediaKeySystemAccess.createMediaKeys() with the MediaKeys object, which is now setup to talk to a CDM instance.

Once content JavaScript has a MediaKeys object, it can set it on an HTMLMediaElement using HTMLMediaElement.setMediaKeys().

The MediaKeys object encapsulates the ChromiumCDMProxy, which proxies commands sent to the CDM into calls to ChromiumCDMParent on the GMP thread.

How EME playback works

There are two main cases that we care about here; encrypted content being encountered before a MediaKeys is set on the HTMLMediaElement, or after. Note that the CDM is only usable to the media pipeline once it's been associated with a media element by script calling HTMLMediaElement.setMediaKeys().

If we detect encrypted media streams in the MediaFormatReader's pipeline, and we don't have a CDMProxy, the pipeline will move into a "waiting for keys" state, and not resume playback until content JS has set a MediaKeys on the HTMLMediaElement. Setting a MediaKeys on the HTMLMediaElement causes the encapsulated ChromiumCDMProxy to bubble down past MediaDecoder, through the layers until it ends up on the MediaFormatReader, and the EMEDecoderModule.

Once we've got a CDMProxy pushed down to the MediaFormatReader level, we can use the PDMFactory to create a decoder which can process encrypted samples. The PDMFactory will use the EMEDecoderModule to create the EME MediaDataDecoders, which process the encrypted samples.

The EME MediaDataDecoders talk directly to the ChromiumCDMParent, which they get from the ChromiumCDMProxy on initialization. The ChromiumCDMParent is the IPDL parent actor for communicating with CDMs.

All calls to the ChromiumCDMParent should be made on the GMP thread. Indeed, one of the primary jobs of the ChromiumCDMProxy is to proxy calls made by the MediaKeys on the main thread to the GMP thread so that commands can be sent to the CDM via off main thread IPC.

Any callbacks from the CDM in the GMP process are made onto the ChromiumCDMChild object, and they're sent via PChromiumCDM IPC over to ChromiumCDMParent in the content process. If they're bound for the main thread (i.e. the MediaKeys or MediaKeySession objects), the ChromiumCDMCallbackProxy ensures they're proxied to the main thread.

Before the EME MediaDataDecoders submit samples to the CDM, they first ensure that the samples have a key with which to decrypt the samples. This is achieved by a SamplesWaitingForKey object. We keep a copy in the content process of what keyIds the CDM has reported are usable in the CDMCaps object. The information stored in the CDMCaps about which keys are usable is mirrored in the JavaScript exposed MediaKeySystemStatusMap object.

The MediaDataDecoder's decode operation is asynchronous, and the SamplesWaitingForKey object delays decode operations until the CDM has reported that the keys that the sample requires for decryption are usable. Before sending a sample to the CDM, the EME MediaDataDecoders check with the SamplesWaitingForKey, which looks up in the CDMCaps whether the CDM has reported that the sample's keyId is usable. If not, the SamplesWaitingForKey registers with the CDMCaps for a callback once the key becomes usable. This stalls the decode pipeline until content JavaScript has negotiated a license for the media.

Content JavaScript negotiates licenses by receiving messages from the CDM on the MediaKeySession object, and forwarding those messages on to the license server, and forwarding the response from the license server back to the CDM via the MediaKeySession.update() function. These messages are in turn proxied by the ChromiumCDMProxy to the GMP thread, and result in a call to ChromiumCDMParent and thus an IPC message to the GMP process, and a function call into the CDM there. If the license server sends a valid license, the CDM will report the keyId as usable via a key statuses changed callback.

Once the key becomes usable, the SamplesWaitingForKey gets a callback, and the EME MediaDataDecoder will submit the sample for processing by the CDM and the pipeline unblocks.

EME on Android

EME on Android is similar in terms of the EME DOM binding and integration with the MediaFormatReader and friends, but it uses a MediaDrmCDMProxy instead of a ChromiumCDMProxy. The MediaDrmCDMProxy doesn't talk to the GMP subsystem, and instead uses the Android platform's inbuilt Widevine APIs to process encrypted samples.

How WebRTC uses OpenH264

WebRTC uses OpenH264 for encode and decode of baseline H.264 streams. It doesn't need all the DRM stuff, so it talks to the OpenH264 GMP via the PGMPVideoDecoder and PGMPVideoEncoder protocols.

The child actors GMPVideoDecoderChild and GMPVideoEncoderChild talk to OpenH264, which conforms to the GMP API.

OpenH264 is not used by Firefox for playback of H264 content inside regular <video>, though there is still a GMPVideoDecoder MediaDataDecoder in the tree should this ever be desired.

How GMP shutdown works

Shutdown is confusing, because there are three processes involved. When the destructor of the MediaKeys object in the content process is run (possibly because it's been cycle or garbage collected), it calls CDMProxy::Shutdown(), which calls through to ChromiumCDMParent::Shutdown(), which cancels pending decrypt/decode operations, and sends a Destroy message to the ChromiumCDMChild.

In the GMP process, ChromiumCDMChild::RecvDestroy() shuts down and deletes the CDM instance, and sends a __delete__ message back to the ChromiumCDMParent in the content process.

In the content process, ChromiumCDMParent::Recv__delete__() calls GMPContentParent::ChromiumCDMDestroyed(), which calls CloseIfUnused(). The GMPContentParent tracks the living protocol actors for this plugin instance in this content process, and CloseIfUnused() checks if they're all shutdown. If so, we unlink the GMPContentParent from the GeckoMediaPluginServiceChild (which is PGMPContent protocol's manager), and close the GMPContentParent instance. This shuts down the bridge between the content and GMP processes.

This causes the GMPContentChild in the GMP process to be removed from the GMPChild in GMPChild::GMPContentChildActorDestroy(). This sends a GMPContentChildDestroyed message to GMPParent in the main process.

In the main process, GMPParent::RecvPGMPContentChildDestroyed() checks if all actors on its side are destroyed (i.e. if all content processes' bridges to this GMP process are shutdown), and will shutdown the child process if so. Otherwise we'll check again the next time one of the GMPContentParents shuts down.

Note there are a few places where we use GMPContentParent::CloseBlocker. This stops us from shutting down the child process when there are no active actors, but we still need the process alive. This is useful for keeping the child alive in the time between operations, for example after we've retrieved the GMPContentParent, but before we've created the ChromiumCDM (or some other) protocol actor.

How crash reporting works for EME CDMs

Crash handling for EME CDMs is confusing for the same reason as shutdown; because there are three processes involved. It's tricky because the crash is first reported in the parent process, but we need state from the content process in order to identify which tabs need to show the crash reporter notification box.

We receive a GMPParent::ActorDestroy() callback in the main process with aWhy==AbnormalShutdown. We get the crash dump ID, and dispatch a task to run GMPNotifyObservers() on the main thread. This collects some details, including the pluginID, and dispatches an observer service notification "gmp-plugin-crash". A JavaScript module ContentCrashHandlers.jsm observes this notification, and rebroadcasts it to the content processes.

JavaScript in every content process observes the rebroadcast, and calls mozIGeckoMediaPluginService::RunPluginCrashCallbacks(), passing in the plugin ID. Each content process' GeckoMediaPluginService then goes through its list of GMPCrashHelpers, and finds those which match the pluginID. We then dispatch a PluginCrashed event at the window that the GMPCrashHelper reports as the current window owning the plugin. This is then handled by PluginChild.jsm, which sends a message to cause the crash reporter notification bar to show.

GMP crash reporting for WebRTC

Unfortunately, the code paths for WebRTC handling crashes is slightly different, due to their window being owned by PeerConnection. They don't use GMPCrashHelpers, they have PeerConnection help find the target window to dispatch PluginCrashed to.

Thundering Herd

Thursday, 27 June 2019

Firefox's Gecko Media Plugin & EME Architecture