Saturday, 8 April 2023

Solved: AMR iCal events WordPress plugin out of memory; my website is slow

I administer my kid's school's WordPress website. It's been slow and unresponsive on-and-off for years. We're hosted on DreamHost, on a shared hosting plan. Some time ago I contacted DreamHost support about the issue, they had told me that we were hitting out memory limit, and was directed to some general WordPress optimisation tips. Things like disabling unused plugins/themes and the like, which didn't help. At that time, I updated our PHP version, and the problem seemed to be mostly resolved. But we recently started to get complaints again, so I resolved to look into it when I got a chance. I was finally able to devote some time to it recently, and figured it out.

Understanding the problem

The first thing I wanted to do was qualify the problem; I wanted to know how often do web requests actually get rejected? So I checked the server logs. DreamHost runs Apache, so the access logs were in our home directory at `~/logs/$domain/https/access.log`. These showed a number of 500 requests, mostly reading the school newsletter. Looking closer at the error log at  `~/logs/$domain/https/error.log`, I immediately got a big clue, seeing numerous log entries such as the following:

[...]  PHP Fatal error:  Allowed memory size of 134217728 bytes exhausted (tried to allocate 4194304 bytes) in /home/[...]/wp-content/plugins/amr-ical-events-list/includes/amr-rrule.php on line 1127

[...]  PHP Fatal error:  Allowed memory size of 134217728 bytes exhausted (tried to allocate 4194304 bytes) in /home/[...]/wp-includes/class-wp-recovery-mode.php on line 178

[...]  PHP Fatal error:  Allowed memory size of 134217728 bytes exhausted (tried to allocate 4096 bytes) in /home/[...]/wp-content/plugins/amr-ical-events-list/includes/amr-rrule.php on line 55

[...]  PHP Fatal error:  Allowed memory size of 134217728 bytes exhausted (tried to allocate 4096 bytes) in /home/[...]/wp-content/plugins/amr-ical-events-list/includes/amr-rrule.php on line 55

[...]  PHP Fatal error:  Allowed memory size of 134217728 bytes exhausted (tried to allocate 4194304 bytes) in /home/[...]/wp-content/plugins/amr-ical-events-list/includes/amr-rrule.php on line 1127

[...]  PHP Fatal error:  Allowed memory size of 134217728 bytes exhausted (tried to allocate 4194304 bytes) in /home/[...]/wp-includes/class-wp-recovery-mode.php on line 178

The school puts out a weekly newsletter, and at the top of that is a summary of the upcoming events on the school calendar. This is powered by the AMR iCal Events WordPress plugin, which renders the events by downloading the office's public calendar from Google Calendar as an ICS file, and parsing out the events within a time range specified in the blog post using a WordPress "short code".

A quick google search showed hits that AMR iCal Events plugin has issues with large calendars with lot of recurrent events. Our school calendar was about 900KB, and includes recurrent events...

Looking in ~/logs/$domain/https/access.log, I see a lot of hits coming in from BingBot to a single newsletter which have unique query parameters on them. Since they have unique query parameters (why?!?!) these requests will bypass the WP Super Cache, causing the page to re-render. Which triggers the calendar plugin to refetch the calendar from Google Calendar and reparse the ICS file...

BingBot was sending in requests faster than we could service them. AMR iCal Events struggles with such a large calendar, and the combined memory overhead of servicing all those requests was too much, we'd hit our user's memory limit on the shared hosting, the plugin would start to fail to allocate memory. When we were at our limit, the server's memory use watchdog must have been rejecting or killing requests by human users, causing the site to be slow and unresponsive.

The fix I put in place has two parts:

  1. Asking BingBot to slow down using a robots.txt file.
  2. Writing a small CGI program to reduce the size of the calendar exposed to AMR iCal Events.

Asking BingBot to slow down using a robots.txt file

This is a simple solution, which most people who also encounter this issue should find quite approachable. Simply create a file in the root of your domain called robots.txt containing the following:
User-agent: *
Crawl-delay: 30

This directive asks all search engine's crawlers to only hit you once every 30 seconds. BingBot honors this directive. Not all search engines' crawlers do; Google's doesn't for example.

You need to choose a crawl delay which is greater than the amount of time it takes to render the page, otherwise your server will receive requests faster than it can process them, and it will struggle.

It took about 24 hours for Bing to notice the new robots.txt file and slow down. This would probably solve the issue for most people; if you're trying to resolve this issue yourself, you could stop here if you're happy to wait 24 hours.

Reducing the size of ICS file exposed to AMR iCal Events

While the robots.txt file should slow down BingBot, the calendar plugin was still causing pages to load slowly. I performed a simple load test and measured that it takes on average 13 seconds if the newsletter wasn't in cache!

I probably could have stopped there, but I want the pages to load faster. My solution was to write a program which parses the calendar's ICS file faster than AMR iCal Events, and outputs just the calendar entries in a specified time range, and pass that to AMR iCal Events. Since we're on DreamHost shared hosting, we can't open a socket to listen for incoming requests, so the best dynamic solution I can do was an old-school CGI program.

I investigated several ICS parsing libraries in Python, Rust and Go, and none really handled my particular use case very well. In the end I wrote a parser for the ICS file myself in Go.

The code for this Calendar ICS date filter CGI program is in this Github gist.

The program is simple; it merely reads the ICS file line by line, outputting lines for the events which intersect with the target date range. I only had to parse enough to understand the start/end of events in the file stream, and understand the dates which the event starts and ends including understanding how to expand recurrent event directives. This drastically reduced the size of the calendar we expose to AMR iCal Events.

I setup an hourly cron job to fetch the calendar ICS file from Google Calendar and when the CGI program is hit, it parses the ICS file from disk, and strips out the events which aren't around the date specified in a query parameter to the CGI program. So the CGI program doesn't need to block making a request out to Google Calendar.

I now had a new URL to feed into the AMR iCal Events plugin instead of the Google Calendar URL.

At this stage in the process for me, BingBot was still spamming us multiple times per second. It was only hitting a single newsletter with different query parameters for some reason, so I changed the ICS URL in the AMR shortcode for this newsletter, and the site very quickly became more responsive. The error logs no longer showed out of memory errors. Victory!

But we still had 400-odd newsletters embedding the old URL; these pages would load slow. Changing those by hand would be painful, as it required taking the target date from the short code, replacing the URL with the new URL, and inserting the target date as a query parameter in the new URL. Golang to the rescue again. Using Go, I wrote a migration that replaces all instances of the old AMR iCal short code with the updated short code with the new URL, embedding the target date as a query parameter.

But how to test such a migration? We don't have any kind of staging environment for the school blog. Years ago, I setup a backup script (based on WordPress backup blog post), which means our backups have a SQL dump. So I was simply able to startup a local MySQL instance on my laptop, run the SQL dump from our backups, and then I had a database with content identical to our production site!

Then I wrote a Go program to connect to that DB, and update the posts content using a regex which captured the target date from the short code, and replaced it with a short code containing the new URL. Once that was tested against my local DB, I ran the binary on the server, migrating the 400-odd copies of the old calendar URL to the new smaller calendar CGI program in the prod DB.

Now, our website is much snappier! The newsletters load fast, as is the WordPress Admin.