Saturday 8 April 2023

Solved: AMR iCal events WordPress plugin out of memory; my website is slow

I administer my kid's school's WordPress website. It's been slow and unresponsive on-and-off for years. We're hosted on DreamHost, on a shared hosting plan. Some time ago I contacted DreamHost support about the issue, they had told me that we were hitting out memory limit, and was directed to some general WordPress optimisation tips. Things like disabling unused plugins/themes and the like, which didn't help. At that time, I updated our PHP version, and the problem seemed to be mostly resolved. But we recently started to get complaints again, so I resolved to look into it when I got a chance. I was finally able to devote some time to it recently, and figured it out.

Understanding the problem

The first thing I wanted to do was qualify the problem; I wanted to know how often do web requests actually get rejected? So I checked the server logs. DreamHost runs Apache, so the access logs were in our home directory at `~/logs/$domain/https/access.log`. These showed a number of 500 requests, mostly reading the school newsletter. Looking closer at the error log at  `~/logs/$domain/https/error.log`, I immediately got a big clue, seeing numerous log entries such as the following:

[...]  PHP Fatal error:  Allowed memory size of 134217728 bytes exhausted (tried to allocate 4194304 bytes) in /home/[...]/wp-content/plugins/amr-ical-events-list/includes/amr-rrule.php on line 1127

[...]  PHP Fatal error:  Allowed memory size of 134217728 bytes exhausted (tried to allocate 4194304 bytes) in /home/[...]/wp-includes/class-wp-recovery-mode.php on line 178

[...]  PHP Fatal error:  Allowed memory size of 134217728 bytes exhausted (tried to allocate 4096 bytes) in /home/[...]/wp-content/plugins/amr-ical-events-list/includes/amr-rrule.php on line 55

[...]  PHP Fatal error:  Allowed memory size of 134217728 bytes exhausted (tried to allocate 4096 bytes) in /home/[...]/wp-content/plugins/amr-ical-events-list/includes/amr-rrule.php on line 55

[...]  PHP Fatal error:  Allowed memory size of 134217728 bytes exhausted (tried to allocate 4194304 bytes) in /home/[...]/wp-content/plugins/amr-ical-events-list/includes/amr-rrule.php on line 1127

[...]  PHP Fatal error:  Allowed memory size of 134217728 bytes exhausted (tried to allocate 4194304 bytes) in /home/[...]/wp-includes/class-wp-recovery-mode.php on line 178

The school puts out a weekly newsletter, and at the top of that is a summary of the upcoming events on the school calendar. This is powered by the AMR iCal Events WordPress plugin, which renders the events by downloading the office's public calendar from Google Calendar as an ICS file, and parsing out the events within a time range specified in the blog post using a WordPress "short code".

A quick google search showed hits that AMR iCal Events plugin has issues with large calendars with lot of recurrent events. Our school calendar was about 900KB, and includes recurrent events...

Looking in ~/logs/$domain/https/access.log, I see a lot of hits coming in from BingBot to a single newsletter which have unique query parameters on them. Since they have unique query parameters (why?!?!) these requests will bypass the WP Super Cache, causing the page to re-render. Which triggers the calendar plugin to refetch the calendar from Google Calendar and reparse the ICS file...

BingBot was sending in requests faster than we could service them. AMR iCal Events struggles with such a large calendar, and the combined memory overhead of servicing all those requests was too much, we'd hit our user's memory limit on the shared hosting, the plugin would start to fail to allocate memory. When we were at our limit, the server's memory use watchdog must have been rejecting or killing requests by human users, causing the site to be slow and unresponsive.

The fix I put in place has two parts:

  1. Asking BingBot to slow down using a robots.txt file.
  2. Writing a small CGI program to reduce the size of the calendar exposed to AMR iCal Events.

Asking BingBot to slow down using a robots.txt file

This is a simple solution, which most people who also encounter this issue should find quite approachable. Simply create a file in the root of your domain called robots.txt containing the following:
User-agent: *
Crawl-delay: 30

This directive asks all search engine's crawlers to only hit you once every 30 seconds. BingBot honors this directive. Not all search engines' crawlers do; Google's doesn't for example.

You need to choose a crawl delay which is greater than the amount of time it takes to render the page, otherwise your server will receive requests faster than it can process them, and it will struggle.

It took about 24 hours for Bing to notice the new robots.txt file and slow down. This would probably solve the issue for most people; if you're trying to resolve this issue yourself, you could stop here if you're happy to wait 24 hours.

Reducing the size of ICS file exposed to AMR iCal Events

While the robots.txt file should slow down BingBot, the calendar plugin was still causing pages to load slowly. I performed a simple load test and measured that it takes on average 13 seconds if the newsletter wasn't in cache!

I probably could have stopped there, but I want the pages to load faster. My solution was to write a program which parses the calendar's ICS file faster than AMR iCal Events, and outputs just the calendar entries in a specified time range, and pass that to AMR iCal Events. Since we're on DreamHost shared hosting, we can't open a socket to listen for incoming requests, so the best dynamic solution I can do was an old-school CGI program.

I investigated several ICS parsing libraries in Python, Rust and Go, and none really handled my particular use case very well. In the end I wrote a parser for the ICS file myself in Go.

The code for this Calendar ICS date filter CGI program is in this Github gist.

The program is simple; it merely reads the ICS file line by line, outputting lines for the events which intersect with the target date range. I only had to parse enough to understand the start/end of events in the file stream, and understand the dates which the event starts and ends including understanding how to expand recurrent event directives. This drastically reduced the size of the calendar we expose to AMR iCal Events.

I setup an hourly cron job to fetch the calendar ICS file from Google Calendar and when the CGI program is hit, it parses the ICS file from disk, and strips out the events which aren't around the date specified in a query parameter to the CGI program. So the CGI program doesn't need to block making a request out to Google Calendar.

I now had a new URL to feed into the AMR iCal Events plugin instead of the Google Calendar URL.

At this stage in the process for me, BingBot was still spamming us multiple times per second. It was only hitting a single newsletter with different query parameters for some reason, so I changed the ICS URL in the AMR shortcode for this newsletter, and the site very quickly became more responsive. The error logs no longer showed out of memory errors. Victory!

But we still had 400-odd newsletters embedding the old URL; these pages would load slow. Changing those by hand would be painful, as it required taking the target date from the short code, replacing the URL with the new URL, and inserting the target date as a query parameter in the new URL. Golang to the rescue again. Using Go, I wrote a migration that replaces all instances of the old AMR iCal short code with the updated short code with the new URL, embedding the target date as a query parameter.

But how to test such a migration? We don't have any kind of staging environment for the school blog. Years ago, I setup a backup script (based on WordPress backup blog post), which means our backups have a SQL dump. So I was simply able to startup a local MySQL instance on my laptop, run the SQL dump from our backups, and then I had a database with content identical to our production site!

Then I wrote a Go program to connect to that DB, and update the posts content using a regex which captured the target date from the short code, and replaced it with a short code containing the new URL. Once that was tested against my local DB, I ran the binary on the server, migrating the 400-odd copies of the old calendar URL to the new smaller calendar CGI program in the prod DB.

Now, our website is much snappier! The newsletters load fast, as is the WordPress Admin.

Saturday 4 September 2021

Using Webpack to bundle an Isomorphic npm package which runs in both browsers and NodeJS

I recently tried to create a npm package which runs in both browsers and the Node environment. As a complete beginner to modern Javascript toolchains, I found this quite frustrating, time consuming, and the documentation not very approachable. I couldn't one place documenting how to do this, so here goes!

If you depend on APIs that aren't present in both NodeJS and browser environments (like HTML5 fetch for example), you likely want to be producing a multiple JS bundle files; one for Node, and one for browsers. To do that, you want to configure Webpack to produce multiple targets, i.e. have your webpack.config.json file's module.exports return a list of configs, one for target: node, and one for target: browser.  This will produce multiple bundle JS files in your output dir. Running with the NodeJS/browser example, you'd output an index.node.js and an index.js file.

Another option to consider here to retain a single file might be to use polyfills. Webpack was quite good at telling you which polyfills you need and how to add them. This might allow you to ship a single bundle for both browser and NodeJS, but for the packages I needed I found that bloated my package by 300kB, which was unacceptable. Your mileage may vary. 

Assuming you're going with the multiple output targets approach, you then need to configure your module's npm package.json to direct browsers to load the file produced for the browser target above, and for Node to load file produced for its target. By default the JS environment loading your package will load the entry point file specified in the main field in package.json, so NodeJS will use this, but you can specify which file is executed for browsers by setting the browser field in package.json. So when you have Webpack producing multiple bundle files for multiple targets, you just need to ensure the files specified in package.json line up. Easy... once you know!

You also probably want each configuration to have output.library.type: "umd", so that it works with a variety of Javascript import patterns. If you do that, then you'll also want to set output.globalObject: "this" as well, otherwise you'll get weird undefined errors when trying to touch globals in the Node environment. See also the Webpack output.globalObject docs for a tiny bit more details.

Sunday 19 April 2020

How to setup Python 3 + Virtualenv + Django on DreamHost

I host my personal websites on DreamHost, but it was an ordeal setting up Django using Python 3 in a virtual Python environment. DreamHost's documentation don't really cover this particular permutation. So here's a quick how-to.

Firstly, a quick explainer on the end-goal of how web requests will be routed through the server.

DreamHost uses Apache to handle HTTP requests. We want to use Python to handle some requests, but use Apache to serve the static files (HTML, CSS, etc), as using Python for a general purpose server would not be performant enough. DreamHost recommends and prefers the use of Passenger to handle web requests for Python. Passenger was designed for Ruby, but can be used for Python as well.

Your user's web browser makes requests to your DreamHost's server. The server routes the requests to Passenger, which in turn routes them to your WSGI handler - which will be a Django Project in our case. If the WSGI handler doesn't handle the request, it gets routed back to the Apache to handle. For example requests for a static asset (i.e. an HTML or CSS file) would go to Passenger, not be handled, and then fall back to be handled by the web server. When Passenger is invoked, it will start up an instance of your Python code in a stand alone process for handling requests. This process will be kept alive for some time, and any further requests that come in for the next wee while will be serviced by this process. So your process may be re-used for multiple requests, but you can't guarantee that it will, as Passenger will shutdown your process after some time period of inactivity.

Passenger will by default start up the Python WSGI handler using the system default Python, which at the time of writing on my DreamHost server was still Python 2. In order to get Python 3 inside our virtualenv, we'll need to re-invoke with our virtualenv's Python 3 interpreter.

You don't want your code to be web accessible. So in your home directory you'll create a directory for your code, a directory for your static assets (HTML/CSS files) that Apache uses as the root, and a directory for your Python 3 virtualenv and pip dependencies.

For a database, I simply used SqlLite, with the database file stored inside the ~/app directory. That was easy to setup and is fine for a simple learning project. I never figured out how to get Python 3 to talk to DreamHost's MySQL server; IIRC the native libraries required for Python 3 to talk to MySQL weren't installed on the server I'm on. I expect if you emailed DreamHost's support team they could install the packages for you to enable Python 3 to talk to their MySQL server. That said, for small workloads, I've found SqlLite to perform better than a full-blown MySQL server.

With that all said, let's dive into the specifics of setting this up.

From the DreamHost control panel, create a new domain website under Domains > Manage Domains. Make sure you turn on the "Enable Passenger" option, and follow the advice about the web directory ending in "public". Make a note of the username, password and server that you were assigned. I'm going to assume your Web Directory is www/public in the examples below.

I'll use the convention that anything that could vary based on your setup that you may need to change in the commands/config below will be in bold. For example the username that DreamHost assigns to your new domain, or the Web Directory.

Still in the Control Panel, under FTP & SSH > Manage Users, select your new domain's user, and enable SSH.

Under Domains > SSL/TLS Certificates, select your new domain, and add a free Lets Encrypt certificate. This will mean you'll be able to use HTTPS on your site.

For convenience, I would then follow DreamHost's instructions to enable passwordless login via SSH keys. For me this was as simple as:

$ ssh-copy-id -i ~/.ssh/

I also like to add the new domain as a Host mapping to the underlying host to the domain name in my SSH config, so I can SSH simply with ssh without needing to remember the username. e.g. on your local machine add to ~/.ssh/config:

User dh_username

SSH into your new domain.
$ ssh

In your home directory, create a new Python3 virtualenv:

$ python3 -m virtualenv -p `which python3` env

Activate the venv.
$ . env/bin/activate

Install Django, and create your new Django project.
$ pip install django
$ python -m django startproject app

You can use something other than "app" for your project name if you like, but remember to replace "app" with your project name in the commands/config below.

You should now have in your home directory subdirectories of env, app and www, for the virtualenv, Django project Python code, and static files respectively.

Before your Django project will function you need to run the initial migrations to setup the database.

$ cd ~/app
$ python migrate

Create super user, taking note of the username/password.
$ python createsuperuser

Now to connect Passenger to you Django project via a WSGI handler.

In your ~/www folder, add a file with these contents:


import sys, os

# Switch to the virtualenv if we're not already there
VENV_DIR = '/home/dh_username/env'
APP_DIR = '/home/dh_username/app'
INTERP = VENV_DIR + "/bin/python3"
if sys.executable != INTERP:
    os.execl(INTERP, INTERP, *sys.argv)

sys.path.append(APP_DIR + '/app')

sys.path.insert(0, VENV_DIR + '/bin')
sys.path.insert(0, VENV_DIR + '/lib/python3.6/site-packages')

os.environ['DJANGO_SETTINGS_MODULE'] = 'app.settings'

from django.core.wsgi import get_wsgi_application
application = get_wsgi_application()

Note: Replace the things highlighted in BOLD with the appropriate things for your username/app.
That is, replace dh_username with your username , and replace app with your project name, if you didn't use "app".

This script ensures that if the WSGI handler is not invoked with the venv's interpreter, it re-invokes with the venv's interpreter and thus the pip packages installed in the venv.

Make that executable:
$ chmod +x ~/www/

Open your ~/app/app/ and add your site's domain name to the ALLOWED_HOSTS list.

We also need to configure the STATIC_ROOT. This is the path that Django collectstatic management command puts the static assets in. These are the CSS/JS files needed by the Django admin interface. These need to be in a subdirectory of your Web Directory, so Apache can serve them. So they need to be in a subdirectory of ~/www/public if you used "www/public" as the Web Directory when you setup the domain in the DreamHost control panel.

So add to your ~/app/app/ after the definition of STATIC_URL:

STATIC_ROOT = '/home/dh_username/www/public' + STATIC_URL

Note that you need to update dh_username to match your username, and www to match your Web Directory.

Then build the static assets, and put them there.
$ python collectstatic

This will copy the static assets required by the Django admin interface into ~/www/public/static/ (assuming your STATIC_URL is the default of "/static/").

Add a simple ~/www/public/index.html.

Your domain should almost be ready to respond to web requests. The final step is to let the Passenger handler know that your WSGI handler code has changed. Passenger monitors the timestamp on the ~/www/tmp/restart.txt file, and every time the timestamp changes it reloads the handler - your Django Python project.

To to make the web server notice your new handler code:

$ mkdir ~/www/tmp/
$ touch ~/www/tmp/restart.txt 

Now open your domain's index.html in your web browser. Hopefully if everything's working, you should see your placeholder index.html.

Now open your https://$YOUR_DOMAIN/admin in your web browser, and you should see the Django admin UI, complete with styles.

If you don't see the styles, it's likely your STATIC_ROOT or STATIC_URL isn't correct.

If you do see the Django admin, then you know that Django is working, and you can now start adding other views. Victory!

Note, before you publicise your size, you should turn off DEBUG=True in your project's!

Also be aware that every time you change your Python code, you need to touch ~/www/tmp/restart.txt, to tell Passenger to reload your Python code. Without this step, Passenger may not restart the already running WSGI handler, and so you'll end up running the old in-memory code, rather than the new code you just updated sitting on disk.

So as part of your deploy process, you need to ensure you touch ~/www/tmp/restart.txt.

Normal procedure these days is to put secrets in environment variables, but I couldn't figure out a satisfactory way to inject environment variables into the Passenger/WSGI process with DreamHost. So I stored the secrets (like the CSRF secret key, email login details) and settings that vary between production and development in a settings.json file which I keep in the ~/app/ dir. My loads this file, and there are fields in that file that control things that vary between my local dev environment and production. For example I have fields which control the value of DEBUG and ALLOWED_HOSTS, and on my dev machine the settings.json sets DEBUG=True and adds "localhost" to the ALLOWED_HOSTS. Whereas on production, these values should default to safe values. I don't commit my settings.json into my git repository, so that way I am not committing secrets.

So with that, you should now have a working Django project on DreamHost running Python 3 in a virtualenv!

Thursday 27 June 2019

Firefox's Gecko Media Plugin & EME Architecture

For rendering audio and video Firefox typically uses either the operating system's audio/video codecs or bundled software codec libraries, but for DRM video playback (like Netflix, Amazon Prime Video, and the like) and WebRTC video calls using baseline H.264 video, Firefox relies on Gecko Media Plugins, or GMPs for short.

This blog post describes the architecture of the Gecko Media Plugin system in Firefox, and the major class/objects involved, as it looked in June 2019.

For DRM video Firefox relies upon Google's Widevine Content Decryption Module, a dynamic shared library downloaded at runtime. Although this plugin doesn't conform to the GMP ABI, we provide an adapter to allow it to be run through the GMP system. We use the same Widevine CDM plugin that Chrome uses.

For decode and encode of H.264 streams for WebRTC, Firefox uses OpenH264, which is provided by Cisco. This plugin implements the GMP ABI.

These two plugins are downloaded at runtime from Google's and Cisco's servers, and installed in the user's Firefox profile directory.

We also ship a ClearKey CDM, which is the baseline decryption scheme required by the Encrypted Media Extensions specification. This mimics interface which the Widevine CDM implements, and is used in our EME regression tests. It's bundled with the rest of Firefox, and lives in the Firefox install directory.

The objects involved in running GMPs are spread over three processes; the main (AKA parent) process, the sandboxed content process where we run JavaScript and load web pages, and the sandboxed GMP process, which only runs GMPs.

The main facade to the GMP system is the GeckoMediaPluginService. Clients use the GeckoMediaPluginService to instantiate IPDL actors connecting their client to the GMP process, and to configure the service. In general, most operations which involve IPC to the GMPs/CDMs should happen on the GMP thread, as the GMP related protocols are processed on that thread.

mozIGeckoMediaPluginService can be used on the main thread by JavaScript, but the main-thread accessible methods proxy their work to the GMP thread.

How GMPs are downloaded and installed

The Firefox front end code which manages GMPs is the GMPProvider. This is a JavaScript object, running in the front end code in the main process. On startup if any existing GMPs are already downloaded and installed, this calls mozIGeckoMediaPluginService.addPluginDir() with the path to the GMP's location on disk. Gecko's C++ code then knows about the GMP. The GeckoMediaPluginService then parses the metadata file in that GMP's directory, and creates and stores a GMPParent for that plugin. At this stage the GMPParent is like a template, which stores the metadata describing how to start a plugin of this type. When we come to instantiate a plugin, we'll clone the template GMPParent into a new instance, and load a child process to run the plugin using the cloned GMPParent.

Shortly after the browser starts up (usually within 60 seconds), the GMPProvider will decide whether it should check for new GMP updates. The GMPProvider will check for updates if either it has not checked in the past 24 hours, or if the browser has been updated since last time it checked. If the GMPProvider decides to check for updates, it will poll Mozilla's Addons Update Server. This will return an update.xml file which lists the current GMPs for that particular Firefox version/platform, and the URLs from which to download those plugins. The plugins are hosted by third parties (Cisco and Google), not on Mozilla's servers. Mozilla only hosts the manifest describing where to download them from.

If the GMPs in the update.xml file are different to what is installed, Firefox will update its GMPs to match the update.xml file from AUS. Firefox will download and verify the new GMP, uninstall the old GMP, install the new GMP, and then add the new GMP's path to the mozIGeckoMediaPluginService. The objects that do this are the GMPDownloader and the GMPInstallManager, which are JavaScript modules in the front end code as well.

Note Firefox will take action to ensure its installed GMPs matches whatever is specified in the update.xml file. So if a version of a GMP which is older than what is installed is specified in the update.xml file, Firefox will uninstall the newer version, and download and install the older version. This is to allow a GMP update to be rolled back if a problem is detected with the newer GMP version.

If the AUS server can't be contacted, and no GMPs are installed, Firefox has the URLs of GMPs baked in, and will use those URLs to download the GMPs.

On startup, the GMPProvider also calls mozIGeckoMediaPluginService.addPluginDir() for the ClearKey CDM, passing in its path in the Firefox install directory.

How EME plugins are started in Firefox

The lifecycle for Widevine and ClearKey CDM begins in the content process with content JavaScript calling Navigator.requestMediaKeySystemAccess(). Script passes in a set of MediaKeySystemConfig, and these are passed forward to the MediaKeySystemAccessManager. The MediaKeySystemAccessManager figures out a supported configuration, and if it finds one, returns a MediaKeySystemAccess from which content JavaScript can instantiate a MediaKeys object. 

Once script calls MediaKeySystemAccess.createMediaKeys(), we begin the process of instantiating the plugin. We create a MediaKeys object and a ChromiumCDMProxy object, and call Init() on the proxy. The initialization is asynchronous, so we return a promise to content JavaScript and on success we'll resolve the promise with the MediaKeys instance which can talk to the CDM in the GMP process.

To create a new CDM, ChromiumCDMProxy::Init() calls GeckoMediaPluginService::GetCDM(). This runs in the content process, but since the content process is sandboxed, we can't create a new child process to run the CDM there and then. As we're in the content process, the GeckoMediaPluginService instance we're talking to is a GeckoMediaPluginServiceChild. This calls over to the parent process to retrieve a GMPContentParent bridge. GMPContentParent acts like the GMPParent in the content process. GeckoMediaPluginServiceChild::GetContentParent() retrieves the bridge, and sends a LaunchGMPForNodeId() message to instantiate the plugin in the parent process.

In the non multi-process Firefox case, we still call GeckoMediaPluginService::GetContentParent(), but we end up running GeckoMediaPluginServiceParent::GetContentParent(), which can just instantiate the plugin directly.

When the parent process receives a LaunchGMPForNodeId() message, the GMPServiceParent runs through its list of GMPParents to see if there's one matching the parameters passed over. We check to see if there's an instance from the same NodeId, and if so use that. The NodeId is a hash of the origin requesting the plugin, combined with the top level browsing origin, plus salt. This ensures GMPs from different origins always end up running in different processes, and GMPs running in the same origin run in the same process.

If we don't find an active GMPParent running the requested NodeId, we'll make a copy of a GMPParent matching the parameters, and call LoadProcess() on the new instance. This creates a GMPProcessParent object, which in turn uses GeckoChildProcessHost to run a command line to start the child GMP process. The command line passed to the newly spawned child process causes the GMPProcessChild to run, which creates and initializes the GMPChild, setting up the IPC connection between GMP and Main processes.

The GMPChild delegates most of the business of loading the GMP to the GMPLoader. The GMPLoader opens the plugin library from disk, and starts the Sandbox using the SandboxStarter, which has a different implementation for every platform. Once the sandbox is started, the GMPLoader uses a GMPAdapter parameter to adapt whatever binary interface the plugin exports (the Widevine C API for example) to the match the GMP API. We use the adapter to call into the plugin to instantiate an instance of the CDM. For OpenH264 we simply use a PassThroughAdapter, since the plugin implements the GMP API.

If all that succeeded, we'll send a message reporting success to the parent process, which in turn reports success to the content process, which resolves the JavaScript promise returned by MediaKeySystemAccess.createMediaKeys() with the MediaKeys object, which is now setup to talk to a CDM instance.

Once content JavaScript has a MediaKeys object, it can set it on an HTMLMediaElement using HTMLMediaElement.setMediaKeys().

The MediaKeys object encapsulates the ChromiumCDMProxy, which proxies commands sent to the CDM into calls to ChromiumCDMParent on the GMP thread.

How EME playback works

There are two main cases that we care about here; encrypted content being encountered before a MediaKeys is set on the HTMLMediaElement, or after. Note that the CDM is only usable to the media pipeline once it's been associated with a media element by script calling HTMLMediaElement.setMediaKeys().

If we detect encrypted media streams in the MediaFormatReader's pipeline, and we don't have a CDMProxy, the pipeline will move into a "waiting for keys" state, and not resume playback until content JS has set a MediaKeys on the HTMLMediaElement. Setting a MediaKeys on the HTMLMediaElement causes the encapsulated ChromiumCDMProxy to bubble down past MediaDecoder, through the layers until it ends up on the MediaFormatReader, and the EMEDecoderModule.

Once we've got a CDMProxy pushed down to the MediaFormatReader level, we can use the PDMFactory to create a decoder which can process encrypted samples. The PDMFactory will use the EMEDecoderModule to create the EME MediaDataDecoders, which process the encrypted samples.

The EME MediaDataDecoders talk directly to the ChromiumCDMParent, which they get from the ChromiumCDMProxy on initialization. The ChromiumCDMParent is the IPDL parent actor for communicating with CDMs.

All calls to the ChromiumCDMParent should be made on the GMP thread. Indeed, one of the primary jobs of the ChromiumCDMProxy is to proxy calls made by the MediaKeys on the main thread to the GMP thread so that commands can be sent to the CDM via off main thread IPC.

Any callbacks from the CDM in the GMP process are made onto the ChromiumCDMChild object, and they're sent via PChromiumCDM IPC over to ChromiumCDMParent in the content process. If they're bound for the main thread (i.e. the MediaKeys or MediaKeySession objects), the ChromiumCDMCallbackProxy ensures they're proxied to the main thread.

Before the EME MediaDataDecoders submit samples to the CDM, they first ensure that the samples have a key with which to decrypt the samples. This is achieved by a SamplesWaitingForKey object. We keep a copy in the content process of what keyIds the CDM has reported are usable in the CDMCaps object. The information stored in the CDMCaps about which keys are usable is mirrored in the JavaScript exposed MediaKeySystemStatusMap object.

The MediaDataDecoder's decode operation is asynchronous, and the SamplesWaitingForKey object delays decode operations until the CDM has reported that the keys that the sample requires for decryption are usable. Before sending a sample to the CDM, the EME MediaDataDecoders check with the SamplesWaitingForKey, which looks up in the CDMCaps whether the CDM has reported that the sample's keyId is usable. If not, the SamplesWaitingForKey registers with the CDMCaps for a callback once the key becomes usable. This stalls the decode pipeline until content JavaScript has negotiated a license for the media.

Content JavaScript negotiates licenses by receiving messages from the CDM on the MediaKeySession object, and forwarding those messages on to the license server, and forwarding the response from the license server back to the CDM via the MediaKeySession.update() function. These messages are in turn proxied by the ChromiumCDMProxy to the GMP thread, and result in a call to ChromiumCDMParent and thus an IPC message to the GMP process, and a function call into the CDM there. If the license server sends a valid license, the CDM will report the keyId as usable via a key statuses changed callback.

Once the key becomes usable, the SamplesWaitingForKey gets a callback, and the EME MediaDataDecoder will submit the sample for processing by the CDM and the pipeline unblocks. 

EME on Android

EME on Android is similar in terms of the EME DOM binding and integration with the MediaFormatReader and friends, but it uses a MediaDrmCDMProxy instead of a ChromiumCDMProxy. The MediaDrmCDMProxy doesn't talk to the GMP subsystem, and instead uses the Android platform's inbuilt Widevine APIs to process encrypted samples.

How WebRTC uses OpenH264

WebRTC uses OpenH264 for encode and decode of baseline H.264 streams. It doesn't need all the DRM stuff, so it talks to the OpenH264 GMP via the PGMPVideoDecoder and PGMPVideoEncoder protocols.

The child actors GMPVideoDecoderChild and GMPVideoEncoderChild talk to OpenH264, which conforms to the GMP API.

OpenH264 is not used by Firefox for playback of H264 content inside regular <video>, though there is still a GMPVideoDecoder MediaDataDecoder in the tree should this ever be desired.

How GMP shutdown works

Shutdown is confusing, because there are three processes involved. When the destructor of the MediaKeys object in the content process is run (possibly because it's been cycle or garbage collected), it calls CDMProxy::Shutdown(), which calls through to ChromiumCDMParent::Shutdown(), which cancels pending decrypt/decode operations, and sends a Destroy message to the ChromiumCDMChild.

In the GMP process, ChromiumCDMChild::RecvDestroy() shuts down and deletes the CDM instance, and sends a __delete__ message back to the ChromiumCDMParent in the content process.

In the content process, ChromiumCDMParent::Recv__delete__() calls GMPContentParent::ChromiumCDMDestroyed(), which calls CloseIfUnused(). The GMPContentParent tracks the living protocol actors for this plugin instance in this content process, and CloseIfUnused() checks if they're all shutdown. If so, we unlink the GMPContentParent from the GeckoMediaPluginServiceChild (which is PGMPContent protocol's manager), and close the GMPContentParent instance. This shuts down the bridge between the content and GMP processes.

This causes the GMPContentChild in the GMP process to be removed from the GMPChild in GMPChild::GMPContentChildActorDestroy(). This sends a GMPContentChildDestroyed message to GMPParent in the main process.

In the main process, GMPParent::RecvPGMPContentChildDestroyed() checks if all actors on its side are destroyed (i.e. if all content processes' bridges to this GMP process are shutdown), and will shutdown the child process if so. Otherwise we'll check again the next time one of the GMPContentParents shuts down. 

Note there are a few places where we use GMPContentParent::CloseBlocker. This stops us from shutting down the child process when there are no active actors, but we still need the process alive. This is useful for keeping the child alive in the time between operations, for example after we've retrieved the GMPContentParent, but before we've created the ChromiumCDM (or some other) protocol actor.

How crash reporting works for EME CDMs

Crash handling for EME CDMs is confusing for the same reason as shutdown; because there are three processes involved. It's tricky because the crash is first reported in the parent process, but we need state from the content process in order to identify which tabs need to show the crash reporter notification box.

We receive a GMPParent::ActorDestroy() callback in the main process with aWhy==AbnormalShutdown. We get the crash dump ID, and dispatch a task to run GMPNotifyObservers() on the main thread. This collects some details, including the pluginID, and dispatches an observer service notification "gmp-plugin-crash".  A JavaScript module ContentCrashHandlers.jsm observes this notification, and rebroadcasts it to the content processes.

JavaScript in every content process observes the rebroadcast, and calls mozIGeckoMediaPluginService::RunPluginCrashCallbacks(), passing in the plugin ID. Each content process' GeckoMediaPluginService then goes through its list of GMPCrashHelpers, and finds those which match the pluginID. We then dispatch a PluginCrashed event at the window that the GMPCrashHelper reports as the current window owning the plugin. This is then handled by PluginChild.jsm, which sends a message to cause the crash reporter notification bar to show.

GMP crash reporting for WebRTC

Unfortunately, the code paths for WebRTC handling crashes is slightly different, due to their window being owned by PeerConnection. They don't use GMPCrashHelpers, they have PeerConnection help find the target window to dispatch PluginCrashed to.

Friday 7 June 2019

Quick start: Profiling local builds of Firefox for Android and GeckoView_example

Getting building and profiling Firefox for Android or GeckoView_example is relatively easy if you know how, so here's my quickstart guide.

See also, the official GeckoView documentation.

First, ensure you run ./mach boostrap, and select "4. GeckoView/Firefox for Android".

Here's the mozconfig I'm using (Ubuntu 18.04):
ac_add_options --enable-optimize
ac_add_options --disable-debug
ac_add_options --enable-release
ac_add_options --disable-tests
mk_add_options AUTOCLOBBER=1
ac_add_options --enable-debug-symbols
# With the following compiler toolchain:
export CC="/home/chris/.mozbuild/clang/bin/clang -fcolor-diagnostics"
export CXX="/home/chris/.mozbuild/clang/bin/clang++ -fcolor-diagnostics"
ac_add_options --with-ccache=/usr/bin/ccache
mk_add_options 'export RUSTC_WRAPPER=sccache'
# Build GeckoView/Firefox for Android:
ac_add_options --enable-application=mobile/android
# Work around issues with mozbuild not finding the exact JDK that works.
# See also
ac_add_options --with-java-bin-path=/usr/lib/jvm/java-8-openjdk-amd64/bin
# With the following Android NDK:
ac_add_options --with-android-ndk="/home/chris/.mozbuild/android-ndk-r17b"
ac_add_options --with-android-min-sdk=16
ac_add_options --target=arm-linux-androideabi
A noteworthy item in there is "--with-java-bin-path". I've had trouble on Ubuntu with the system default Java not being the right version. This helps.

Note that if you're profiling, you really want to be doing a release build. The behaviour of release is different from an optimized build.

If you're debuging, you probably need --enable-debug. For details of how to debug, see GeckoView Debugging Native Code in Android Studio.

To build, package, and install Firefox for Android (Fennec) on your Android device, run:
./mach build && ./mach package && ./mach install 
Note that you need to do the package step after every build. Once you've installed, you can start Firefox on a given URL with:
./mach run --url
For testing and profiling GeckoView, the easiest option is to run the GeckoView_example app. To build and install this, run:
./mach build && ./mach package && ./mach android build-geckoview_example && ./mach android install-geckoview_example
To run GeckoView_example, opening a URL:
adb shell am start -a android.intent.action.MAIN -c android.intent.category.LAUNCHER -n org.mozilla.geckoview_example/org.mozilla.geckoview_example.GeckoViewActivity -d ''
If you want to set environment variables, for example to turn on MOZ_LOGs, run like so:
adb shell am start -a android.intent.action.MAIN -c android.intent.category.LAUNCHER -n org.mozilla.geckoview_example/org.mozilla.geckoview_example.GeckoViewActivity -d '' --es env0 MOZ_LOG=MediaSource:5
Note if you want to create more than one environment variable, each one needs to be numberd, i.e. `--es env0 FOO=BAR env1 BAZ=FUZ`, and so on. Also note that you do not put quotes around environment variables here. That is, use `--es env0 FOO=BAR`, do not use `--es env0 FOO="BAR"`.

MOZ_LOGs go to adb logcat. To setup an output stream that reads specific MOZ_LOGs:
adb logcat | grep MediaSource
This stays open, printing logs until you terminate with CTRL+C. If you want to exit at the end of the logs buffered, pass -d. i.e.:
adb logcat -d > log_file.txt
Apparently you can pass a logtag filterspec to `adb logcat` to have it filter for you, but I never figured the syntax out.

To clear logcat's buffered logs:
adb logcat --clear
This is useful if you're prinf-debugging something via logcat, and want to clear the decks before each run.

Other useful commands...

To terminate a running GeckoView_example:
adb shell am force-stop org.mozilla.geckoview_example
To list all packages on your device related to Mozilla:
adb shell pm list packages mozilla
To uninstall a GeckoView_example:
adb uninstall org.mozilla.geckoview_example && adb uninstall org.mozilla.geckoview_example.test
Note that this also uninstalls the GeckoView test app. Sometimes you may find you need to uninstall both apps before you can re-install. I think this is related to different versions of adb interacting.

To get the Android version on your device:
adb shell getprop
To simulate typing text:
adb shell input text "your text"
To profile a GeckoView_example session, you need to download the latest Firefox Desktop Nightly build, and install the Firefox Profiler add-on. Note that the Firefox Profiler Documentation is pretty good, so I'll only cover the highlights.

Once you've got Firefox Desktop Nightly and the Firefox Profiler add-on installed, start up your GeckoView_example app and URL you want to profile, and in Firefox Nightly Desktop open about:debugging. Click "Connect" to attach to the device you want to profile on, and then click "Profile Performance".

If you're profiling media playback, you want to add "Media" to the custom thread names under the "Threads" settings.

Since you're profiling a local build, you want to open the "Local build" settings, and ensure you add the path to your object directory.

Once you're configured, press "Start recording", do the thing in GeckoView_example you're profiling, and then hit "Stop and grab the recording".

The profile will open in a new tab in the browser. Sometimes I've noticed that the profiler hangs at "Waiting for symbol tables for library". Just reloading the page seems to resovle this normally.

I find the Firefox Profiler very straightforward to use. The Flame Graph view can be particularly enlightening to see where threads are spending time.

Unfortunately the Firefox profiler can't symbollocate Java call stacks. Java calls usually show up as hex addresses, sometimes on the far side of an AndroidBridge C++ call.

On Android >= P you can use Simpleperf to capture profiles with both native and JIT'd Java call stacks. Andrew Creskey has instructions on how to use Simpleperf with GeckoView_example.

Saturday 3 November 2018

On learning Go and a comparison with Rust

I spoke at the AKL Rust Meetup last month (slides) about my side project doing data mining in Rust. There were a number of engineers from Movio there who use Go, and I've been keen for a while to learn Go and compare it with Rust and Python for my data mining side projects, so that inspired me to knuckle down and learn Go.

Go is super simple. I was able to learn the important points in a couple of evenings by reading GoByExample, and I very quickly had an implementation of the FPGrowth algorithm in Go up and running. For reference, I also have implementations of FPGrowth in Rust, PythonJava and C++

As a language, Go is very simple. The language lacks many of the higher level constructs of other modern languages, but the lack of these make it very easy to learn, straightforward to use, and easy to read and understand. It feels similar to Python. There's little hidden functionality; you can't overload operators for example, and there's no generics or macros, so the implementation for everything has to be rewritten for every type. This gets tedious, but it does at least mean the implementation for everything is simple and explicit, the code right in front of you.

I also really miss the functional constructs that are built into many other languages, like mapping a function over a sequence, filter, any, all, etc. With Go, you need to reimplement these yourself, and because there's no generics (yet), you need to do it for every type you want to use these on. The lack of generics is also painful when writing custom containers.

Not being able to key a map with a struct containing a slice was a nuisance for my problem domain; I ended up having to write a custom tree-set data structure due to this; though it was very easy to write thanks to in built maps. Whereas Rust, or even Java, has traits/functions you can implement to ensure things can be hashed.

The package management for Go feels a bit tacked on; requiring all Go projects to be in a GO_PATH seems a consequence of not having a tool the equal of Rust's Cargo coupled with something like

And Go's design decision to use the case of a symbol's first letter to express whether that symbol is public or private is annoying. I have a long standing habit of using foo as the name for a single instance of type Foo, but that pattern doesn't work in Go. The consequence of this design choice is it leads programmers to using lots of non-descriptive names for things. Like single letter variable names. Or the dreaded myFoo.

The memory model of Go is simple, and again I think the simplicity is a strength of the language. Go uses escape analysis to determine whether a value escapes outside of a scope, and moves such values to the heap if so. Go also dynamically grows goroutines' stacks, so there's no stack overflow. Go is garbage collected, so you don't have to worry about deallocating things.

I found that thinking of values as being on the heap or stack wasn't a helpful mental model with Go. Once I started to think of variables as references to values and values being shared when I took the address (via the & operator), the memory model clicked.

I think Go's simple memory model and syntax make it a good candidate as a language to teach to beginner programmers, more so than Rust.

The build times are impressively fast, particularly on an incremental build. After the initial build of my project, I was getting build times to fast to perceive on my 2015 13" MBP, which is impressive. Rust has vastly slower build time.

The error messages produced by the Go compiler were very spartan. The Rust compiler produces very helpful error messages, and in general I think Rust is leading here.

Go has a very easy to use profile package which you can embed in your Go program. Combined with GraphViz, it produces simple CPU utilization graphs like this one:
CPU profile graph produced by Go's "profile" package and GraphViz.

Having an easy to use profiler bundled with your app is a huge plus. As we've seen with Firefox, this makes it easy for your users to send you profiles of their workloads on their own hardware. The graph visualization is also very simple to understand.

The fact that Go lacks the ability to mark variables/parameters as immutable is mind-boggling to me. Given the language designers came from C, I'm surprised by this. I've written enough multi-threaded and large system code to know the value of restricting what can mess with your state.

Goroutines are pretty lightweight and neat. You can also use them to make a simple "generator" object; spawn a goroutine to do your stateful computation, and yield each result by pushing it into a channel. The consumer can block on receiving the next value by receiving on the channel, and the producer will block when it pushes into a channel that's not yet been received on. Note you could do this with Rust too, but you'd have to spawn an OS thread to do this, which is more heavy weight than a goroutine, which are basically userspace threads.

Rust's Rayon parallelism crate is simply awesome, and using that I was able to easily and effectively parallelize my Rust FPGrowth implementation using Rayon's parallel-iterators. As best as I can tell, Go doesn't have anything on par with Rayon for parallelism. Go's goroutines are great for lightweight concurrency, but they don't make it as easy as using's Rayon's par_iter() to trivially parallelize a loop. Note, parallelism is not concurrency.

All of my attempts to parallelize my Go FPGrowth implementation as naively as I'd parallelized my Rust+Rayon implementation resulted in a slower Go program. In order to parallelize FPGrowth in Go, I'd have to do something complicated, though I'm sure channels and goroutines would make that easier than in a traditional language like Java or C++.

Go would really benefit from something like Rayon, but unfortunately due to Go's lack of immutability and a borrow checker, it's not safe to naively parallelize arbitrary loops like it is in Rust. So Rust wins on parallelism. Both languages are strong on concurrency, but Rust pulls ahead due to its safety features and Rayon.

Comparing Rust to Go is inevitable... Go to me feels like the spiritual successor to C, whereas Rust is the successor to C++.

I feel that Rust has a learning curve, and before you're over the hump, it can be hard to appreciate the benefits of the constraints Rust enforces. For Go, you get over that hump a lot sooner. Whereas with Rust, you get over that hump a lot later, but the heights you reach after are much higher.

Overall, I think Rust is superior, but if I'd learned Go first I'd probably be quite happy with Go.

Thursday 1 March 2018

Firefox Media Playback Team Review Policy

Reviews form a central part of how we at Mozilla ensure engineering diligence. Prompt, yet thorough, reviews are also a critical component in maintaining team velocity and productivity. Reviews are also one of the primary ways that a distributed organization like Mozilla does its mentoring and development of team members.

So given how important reviews are, it pays to be deliberate about what you're aiming for.

The senior members of the Firefox Media Playback team met in Auckland in August 2016 to codify the roadmap, vision, and policy for the team, and and one of the things we agreed upon was our review policy.

The policy has served us well, as I think we've demonstrated with all we've achieved, so I'm sharing it here in the hope that it inspires others.
  • Having fast reviews is a core value of the media team.
  • Review should be complete by end of next business day.
  • One patch for one logical scope or change. Don't cram everything into one patch!
  • Do not fix a problem, fix the cause. Workarounds are typically bad. Look at the big picture and find the cause.
  • We should strive for a review to be clear. In all cases it should be clear what the next course of action is.
  • Reviews are there to keep bad code out of the tree.
  • Bad code tends to bring out bad reviews.
  • Commit message should describe what the commit does and why. It should describe the old bad behaviour, and the new good behaviour, and why the change needs to be made.
  • R+ means I don’t want to see it again. Maybe with comments that must be addressed before landing.
  • R- means I do want to see it again, with a list of things to fix.
  • R canceled means we’re not going to review this.
  • Anyone on the media team should be expected to complete a follow up bug.
  • It’s not OK for a reviewer to ask a test to be split out from a changeset, provided the test is related to the commit. By the time a patch gets to review, splitting the test out doesn’t create value, just stop-energy.
  • Review request. If response is slow, ping or email for a reminder, otherwise find another reviewer.
  • Don’t be afraid to ask when the review will come. The reply to “when” can be “is it urgent?”
  • Everyone should feel comfortable pointing out flaws/bugs as a “drive by”.
  • Give people as much responsibility as they can handle.
  • Reviewers should make it clear what they haven’t reviewed.
  • American English spelling, for comments and code.
  • Enforce Mozilla coding style, and encourage auto formatters, like `./mach clang-format`.
  • Use reviewboard. Except when you can’t, like security patches.