Update – I have also posted an alternative method that preserves the ability to have full page caching enabled. Please find it at WordPress ActivityPub and Cloudflare.

In a previous post I discussed how to deal with the fact the ActivityPub plugin for WordPress must return author pages in a different format depending on the value of the Accept header. A browser hitting an author page is going to be looking for HTML to be returned, while Mastodon will expect a JSON instead. If you use any kind of caching system be it a CDN, special plugin or combination of the two then you may run into an issue where the wrong content is being cached for each Accept header type. You might see this in your site health report with:

Your author URL does not return valid JSON for application/activity+json. Please check if your hosting supports alternate Accept headers.

In this post I will discuss a method for dealing with this while not totally losing the ability to cache the response. This is useful for busy sites or as a way to help mitigate some forms of DoS attack. The example I provide here is meant for Nginx with php-fpm but you can apply this same sort of thinking anywhere else where you have enough control over the configuration to make it work.

Assuming you followed the previous post and have created an exception for your author URL in your CDN then it is on your server to render author pages each time a request is made. This is a waste of resources and doesn’t provide an ideal experience for end users. To enable caching on this endpoint, we will leverage Nginx’s built in caching capability while setting the cache key based on the Accept header.

To start, let’s setup basic Nginx caching. At the top of your configuration file, outside of the server{} block, (advanced users can adjust as desired) add the following:

fastcgi_cache_path /etc/nginx/cache levels=1:2 keys_zone=wordpress:100m inactive=10m max_size=100m;

You can adjust the path if you want but in essence we are defining a path of /etc/nginx/cache with a name of wordpress. We are limiting it to 100MB and saying delete anything older than 10 minutes if it hasn’t been accessed. /etc/nginx/cache must exist and must be owned by the same user that runs Nginx. If you have multiple servers know that this cache is unlikely to be shared so each server will have a unique cache.

Next, add a map that to define what Accept headers we want to Vary on:

map $http_accept $vary_key {
  default "default";
  "~application/activity\+json" "json";
}

This block will create a new variable we can use later called $vary_key. Notice here that we will only create a different cache entry when application/activity+json is sent included in the list.

Now inside the server{} block for your site, let’s add a nice header we can use to ensure our caching is working properly. Adding add_header X-Nginx-Cache $upstream_cache_status; to this section will cause Nginx to output a header we can see to know the cache status. It will be BYPASS, MISS or HIT in response headers.

Next, inside the location block that is handling PHP requests, add the following config options:

# cache key
fastcgi_cache_key "$vary_key$host$request_method$request_uri";

# matches keys_zone in fastcgi_cache_path
fastcgi_cache wordpress;

# don't cache pages defined earlier
fastcgi_no_cache $no_cache;

#defines the default cache time
fastcgi_cache_valid any 10m;

# misc additional settings
fastcgi_cache_use_stale updating error timeout invalid_header http_500;
fastcgi_cache_lock on;
fastcgi_cache_lock_timeout 10s;

The next settings depend on what you want to do. If you are using a CDN and are only exposing author pages then you can use the following settings

# Cache nothing by default
set $no_cache 1;

# Only cache author pages
if ($request_uri ~* "/author/") {
  set $no_cache 0;
}

If instead, you want to cache everything your CDN might miss then you can use this (this is what I use):

# Cache everything by default
set $no_cache 0;

# Don't cache logged in users or commenters
if ( $http_cookie ~* "comment_author_|wordpress_(?!test_cookie)|wp-postpass_" ) {
  set $no_cache 1;
}

# Don't cache the following URLs
if ($request_uri ~* "/(wp-admin/|wp-login.php)") {
  set $no_cache 1;
}

If done correctly then hitting an author page will result in different results depending on the Accept header being used. To verify, take an author page and load it up in a browser. You should get a proper HTML page. Copy the URL out and, using curl, send the following:

curl -I https://dustinrue.com/author/ruedu/ -H "Accept: application/activity+json"

x-nginx-cache: MISS

Your Nginx cache status may already be HIT if someone recently searched for you. It should be HIT if you send the request again.

Debugging

It is important that you debug and properly resolve this endpoint. Failing to do so will result in failed searches of your author/user from ActivityPub clients. To be clear, the following must return different content:

curl -I https://dustinrue.com/author/ruedu
curl -I https://dustinrue.com/author/ruedu -H "Accept: application/activity+json"

Adjust these URLs for your author page URLs and ensure the first curl returns HTML content while the second one returns JSON content. While writing this post I noticed that the plugin is still outputting that the content type is text/html when it should say application/activity+json. Despite this inconsistency, clients will use the returned content.

If the curl calls are returning different content, next pay attention to the x-nginx-cache header to ensure that it is actually caching. You can add another utility header to your Nginx config to assist with this:

add_header x-accept $vary_key always;

This add_header will output what value the map landed on so you can ensure things are being picked up properly.

Conclusion

I hope this is enough to help guide you in improving your WordPress + ActivityPub experience.

After more than ten years running under the blog.dustinrue.com subdomain I have moved my site to sit on right on dustinrue.com instead. I am making this change simply because this really is the only site I have and there is no need to have it on a subdomain. Redirects have been setup to ensure no links are broken.

In addition to the URL change, I have also removed ads and in fact any tracking at all on the site at all. This means the cookie consent banner is also gone (a personal pet peeve of mine). For what it is worth, the ads and analytics were being done by the Google Site Kit plugin, an excellent plugin put out by my employer.

Quick note about having all but completely left Twitter. Posted here because I can’t actually tell you where I went because of their policies will not allow me to link to my new profile.

About mid-October of 2022 I decided that Twitter was no longer a place I wanted to use as my primary micro-blogging platform. There are many reasons for this but it mostly comes down to the behavior of the CEO, his treatment of employees that work there and the way he is handling the platform. When Elon Musk bought Twitter I was hopeful that he would do a good job with it but it was quickly squashed with news of unreasonable demands of employees, allowing certain people back on the platform and more. It is within his right to do with the platform as he pleases and it remains in my right to choose not to be a part of it. I don’t fault anyone who chooses to remain on the platform but for me it was time to move on.

Instead, I have found a new home on the Mastodon network. You can read about it in this excellently written article at https://tidbits.com/2023/01/27/mastodon-a-new-hope-for-social-networking/ which goes into detail about what it is, how it works and even how to use it. My profile on Mastodon, hosted on my own private instance of the software, is at https://mastodon.dustinrue.com/@dustinrue. After getting past the initial confusion of how the platform works I have settled in and found that I enjoy the platform more than I have Twitter. I appreciate the federated nature, the fact I can block entire instances, being able to follow hashtags, being able to self-host and more.

Mastodon as a platform is still relatively young that has recently seen explosive growth due to people being equally disenfranchised with Twitter and looking for alternatives. I highly recommend people give it a try and see if it is the right fit for them!

Yesterday I was reminded that when a URL is shared on Mastodon, every instance that has a user following you, that server will make a request to your site at least once in an effort to get some additional embed information. If your site is WordPress based, like this one, then you will likely see two requests. The first request to your site will request the URL that was added to the post while the second one follows any embed information WordPress is exposing in order to get some additional meta data. Since Mastodon is a federated system, every Mastodon server or instance will need to gather this data in order for it to be displayed to its users.

If you are a user that has a lot of followers then posting a link to your blog or site will likely result in a mini DDoS has hundreds of Mastodon instances request this information from your server. If you have not taken precautions this can potentially take down your site as it is overloaded with requests! Years ago this would have been referred to as being “slash dotted” (links on https://slashdot.org) or “fireballed” (links on https://daringfireball.net).

Fortunately you can very effectively deal with this situation on your own or by working with your hosting provider. In this post, I am going to describe how I handle the situation using Cloudflare, which is the CDN provider I have chosen to put my site behind. I am not going into full detail on how to implement all options and I am not selling Cloudflare or associated with them beyond being a customer. What I share here will be applicable to any CDN or will at least serve as inspiration for how to handle it in your configuration.

As I said previously, this site is using WordPress and is behind Cloudflare. To make it easy on myself I have also purchased their Automatic Platform Optimization for WordPress feature. I got into this option initially because I wanted to understand it better but have since kept it because it works well. The biggest feature of APO for WordPress is that it enables full page caching for your site. This is a must if you want to get the best possible experience for users globally. Using APO is absolutely not necessary, you can simply use Nginx micro caching instead or any other caching solution, but the key here is to have full page caching so that repeated requests to your site do not incur actual processing time by WordPress.

APO will, out of the box, cache full pages of your site but what it will not protect is the meta data URL used to provide additional information for embeds. To prevent Mastodon servers from crushing your site with embed meta data requests, there is one additional endpoint you need to force to be cached. Here is how I forced Cloudflare to cache the correct URL for me.

Login into Cloudflare and click on the domain for your site. Find the caching section of the menu and click on Cache Rules. Add a new rule and define what is shown in the screenshot

Screenshot showing a cloudflare configuration screen for caching an oembed request from Mastodon, or any system that would do this. Add a name for your rule, set the Field to URI Path contains the path /wp-json/oembed

From here, tell Cloudflare what to do with this match

Screenshot showing another Cloudflare configuration screen. Here you should set the Cache status to "Eligible for cache" and "Override origin" set to 2 hours. 2 hours is the minimum option on a free plan

Note that 2 hours is the lowest cache time I can specify on an otherwise free Cloudflare plan so that is what I set it to. With these options filled out you can click save and you are done. Anything looking for this URL will now either get a cached copy of the response or will cause the content to be cached for future requests.

Of course, you don’t need to use Cloudflare to make this work. Savvy users can also translate these URLs to Nginx or Apache configuration to perform the trick. The goal is to ensure your WordPress site is better able to handle when you have shared a link to Mastodon and there are many options. Using Cloudflare is one option that has worked well for me. I encourage everyone that hosts a blog, either self-hosted or through some managed provider, to ensure that page caching and the oembed URL for WordPress is cached.

If you run a private instance of Mastodon it can feel mighty lonely sometimes. This is due to an inherent design characteristic of Mastodon and federated services in general…how does one instance get information from another instance? Typically, if you have a user on your instance that follows someone else then that information will be added to your instance, along with any hashtags they use and so on. If you run a small server, then obvious there are far fewer ways for information to flow to your instance.

Solving this problem is a matter of ensuring more data is flowing into your instance so that it can then see more content, and most importantly as of version 4.0 of Mastodon, additional hashtags. The easiest way of doing this (aside from running a large instance) is to use relays.

Relays, in essence, take in feeds from a number of instances and passes them to other instances attached to the relay. Finding a relay to add to your instance is as easy as going to https://relaylist.com and picking one or more to add to your instance. Relay servers usually support both Pleroma as well as Mastodon but remember the information you add to each is different. Be sure to add the right URL.

Screenshot showing relay URLs

You can subscribe to a relay by visiting /admin/relays in the admin dashboard of your instance and clicking the “Add New Relay” button. Simply pick a relay server from the list and add it using the correct URL. For Mastodon, the URL will end with /inbox in almost every case. After a short while you will begin to see your Federated timeline be populated with posts from other instances. Your instance will now see a lot of new content including hashtags that you can follow.

Keep in mind that bringing in this cost does have a cost associated with it. All instances of Mastodon will store everything it sees locally, including post content and media like images and video. It is a good idea to set retention limits on your system so that you are not storing everything ever seen forever. On my system I set my retention limits to 14 days for media and content cache and 7 days for user archives. You will find your instance’s content retention policy at /admin/settings/content_retention.

In addition to using more disk space (and bandwidth transferring all that media) you will also incur more processing time on your instance. The amount of space and processing you need depends heavily on which or how many relay services you add to your instance. For my instance I struck a balance between getting enough data flowing so that hashtags were interesting but not so much that I increased my costs unnecessarily. You can track this information in both your admin dashboard at /admin/dashboard (scroll to the bottom) as well as your chosen object storage provider (you did set one up right?).

If you are running a private instance and feel a bit left out I hope this helps you get the activity you are looking for. Also remember to follow a lot of people and boost content you like instead of just liking it. This will lead to more followers for yourself, more interactions and a more interesting timeline!

Since about mid December 2022 I have been running my own private instance of Mastodon. I thought I would detail how I did it and what it has cost me so far.

When I first learned about Mastodon I was excited to get to understand it better, particularly how it is hosted and scaled. For Mastodon, I decided right away that the best way to better my understanding was to host it myself and to do so on my favorite platform, Kubernetes. I started by creating my helm chart (https://github.com/dustinrue/mastodon-helm-chart) and installed the core software in my home lab which consists of k3s. The chart I created is based on the official helm chart (https://github.com/mastodon/chart). I created my own because I, again, wanted to learn about the moving pieces of a Mastodon installation but also because I was unhappy with the official chart integrating Redis and PostgreSQL as dependencies. In addition, it doesn’t break out the Sidekiq processes in a way that makes sense…but more on that later.

Before we can get to deep into what I did, we should probably first discuss some of the major components of a Mastodon instance or server. Mastodon is a collection of services working together to form a full solution which includes:

  • A web service which provides the user interface but is also the sort of API server for all things Mastodon. In a full production setup it is important that this be highly available.
  • A streaming service which feeds data to the web frontend as it arrives and is processed. This is almost important but doesn’t seem to be critical. In other words, you can survive a bit of downtime here, you’ll just have a less than great experience.
  • A number of Sidekiq queues. There are numerous Sidekiq queues which are the heart of how data moves in a Mastodon instance. These queues, as of this writing, include a scheduler, ingress, mailer, push, pull and default. Each queue has a specific purpose and each queue is again not absolutely critical to the availability of your Mastodon instance. This means that you can easily take down each queue temporarily to deal with some issue. While a queue is down know that nothing that queue is responsible for will be processed. The special scheduler queue, if not running, will likely prevent most other queues from doing anything at all.
  • Redis is a glue that keeps data flowing between processes. It is also a critical piece to keep running though losing data within it, while not ideal, is ok. Keeping it running is critical because all of the other Mastodon processes expect it to be available and will fail to start without it. In a full production setup I recommend ensuring it is running in a highly available fashion.
  • PostgreSQL is the last required piece of software when running Mastodon. Like Redis, it is what I could consider to be critical to your setup. If running a full production setup you will want to cluster it to maintain availability first with performance a secondary consideration.
  • You need some system for dealing with email. Mastodon needs to send email for account confirmation and some administrative or moderation work. For my system I am using Send In Blue (https://www.sendinblue.com) which has a free tier.

Mastodon also supports other, optional services which you can read about at https://docs.joinmastodon.org/admin/optional/.

As you can likely see, running Mastodon is not simple yet it isn’t overwhelming either. I believe running Mastodon can be done inexpensively, especially a private instance, but to run it in production correctly, there is definitely a base cost you need to consider so that you can remove as much failure points as possible. In addition, there are many other pieces you will likely want if running a large installation like how to monitor metrics, keeping track of Sidekiq queue depth and processing times and more.

Having spent some time on Mastodon during the great Twitter migration I witnessed some of the struggles of a number of instance admins as a their instances struggled to meet the demands of new users and users who had created accounts before but were suddenly active. I saw a few notable patterns emerge that contributed to their scaling woes including:

  • Not using a CDN or object storage system initially
  • Not installing pgbouncer in front of PostgreSQL
  • Not installing Sidekiq into separate processes running each queue

There are some really excellent guides and references on how to scale Mastodon (https://hazelweakly.me/blog/scaling-mastodon/ to name but one) but many of the recommendations will require you to do or have done one, if not all, of the above mentioned steps. Each of these items are disruptive in a way that you probably do not want to be trying to handle them while in a panic of trying to get your instance running again. If you are running or plan to run a public instance where you allow anyone to sign up then I highly recommend getting at least those three items out of the way from day one. Doing so will help ensure that scaling up from there is much, much easier as most items will then become adding additional servers to run more Sidekiq processes or tuning parameters.

When I created my helm chart, I took these lessons and applied them as conscious decisions in the design of the chart. Though not at all necessary for a small or single user instance, my chart breaks out all of the current Sidekiq queues into separate processes. This layout ensures the hard work of separating the processes out is done and the rest is a matter of scaling and tuning.

As of this writing, my helm chart also installs a weekly cronjob to clean up media files and, optionally, a cronjob for backing up the database to some shared storage in your Kubernetes cluster. Though it is ultimately incomplete, I feel the helm chart is a good start.

As for actually running Mastodon for myself I created a subdomain for my instance to live at. I then installed Mastodon, using my helm chart, into my k3s cluster. Ignoring the cost of my ISP and the computers I have, running Mastodon is quite minimal. My home lab provides everything I need to make Mastodon work including persistent storage using TrueNAS. For media storage, I created a Cloudflare R2 bucket and URL for public access. Mastodon is configured to send media content to R2 which is then served from the CDN URL. This keeps all of the heavy storage separate from the rest of the system. My last bill for R2 was just $0.06 which was for the approximately 20GB of content I have stored there. I do expect my next bill to be more because the average amount of data stored in R2 will be higher.

Since my installation is just a private one, I installed PostgreSQL and Redis as single instances within my k3s cluster. Both instances are extremely basic Bitnami based installed using their available helm charts. PostgreSQL is backed by persistent storage provided by TrueNAS. For email, my k3s cluster runs an installation of Postfix. Postfix is configured to send email through Send In Blue and services that I run in my cluster are configured to talk to Postfix. This allows me to have a single mail relay that I need to maintain the configuration for.

Ingress is provided by Cloudflare and cloudflared tunnels. A tunnel is configured on a different VM I have running and then configured in the Cloudfare side on how to route traffic to the Kube cluster with the correct hostname included.

All said, this setup has proven reliable for me since mid December. In a future post I’ll discuss how I got my private instance to feel a bit more included in the Fediverse by adding relays. Please leave a comment if you feel I missed something or got something wrong.

Quick tip on a rather specific situation I found myself in though I believe it could come up for a lot of people using WordPress trying to integrate with ActivityPub networks. If you are:

  • Running WordPress
  • Using a page caching solution like Cloudflare APO or manually configured
  • Running an ActivityPub plugin and/or webfinger

Then you will likely run into an issue with your site not being reliably discoverable when searched for. Using Matthias Pfefferle‘s ActivityPub, Webfinger and Nodeinfo plugins to get your WordPress site exposed as an ActivityPub server will add a few routes to your site. One of the routes is the author pages of WordPress which exist at /author/<author username>. However, this path when hit with a browser will return HTML. ActivityPub instances on the other hand will be looking for a different content type called application/activity+json. Unfortunately, many caching layers will not provide a Vary on Accept which you will need in order to return different data depending on what type of content the requester is looking for.

To resolve this on my site, which uses Cloudflare for CDN, I added a page rule that disallows caching for my author page. This works because I am the only author on the site. A full “proper” solution would be to set a Vary on the Accept header for that path, which Cloudflare does not support.

You may want to be very specific about what Vary headers are used, on what paths and what you actually accept a Vary header on and so on. Allowing for a wide or unlimited range of values can result in people easily breaking cache at the CDN sending requests to your origin servers.

Last year around this time I realized I wasn’t overly happy with…anything. Work was stressful and I was in a bad mood most of the time. I couldn’t really pinpoint what the cause was so on a whim I checked the app store for an app I could use to track my mood. I wanted to check off a lot of the things I did in a day that I thought my affect me and then review it later.

I landed on Dailyio (I pay to unlock the full reporting) and went to work adding in a few items I wanted to track including whether or not I was frustrated with work, had a lot of meetings or if I was able to focus and make progress on things. There are a lot of default options to start with and then anything I noticed I did often, like drink water or not, watch TV/movies, play games and so on I added.

Here is what I discovered somewhat quickly:

  • Drinking water and taking walks boosted my mood
  • Too much caffeine tanks my mood
  • Video games and TV do not boost my mood a lot but listening to music does
  • Making progress and being able to focus at work boost my mood while too many meetings and interruptions tank my day
  • Going out with my wife on a date is a booster but spending time with extended family quickly drains me (I love you all but I get overwhelmed/drained quickly. It’s not you it’s introvert problems)
  • Related to the previous one, not getting enough time to myself was among the biggest factors in determining how I felt day by day.
  • Cooking, cleaning (easily the biggest surprise for me), writing and travel are all boosters
  • Skipping breakfast is horrible
  • Even a small decrease in sleep quality affects my day. This is something I have always intuitively understood but having data to back it up is great

Aside from quick lessons I have now logged just over a year of data and it has become apparent as well that some patterns follow a yearly cycle. My daily average mood has gone down significantly as my end of year work load has gone up.

After gathering some data for a while I quickly made a few changes early on. I made sure to drink a bit more water than I usually would and have a bit less caffeine (through coffee or soda). I also nearly completely stopped drinking non diet soda as I found the high sugar input on the previous day caused me to feel terrible in the morning very consistently. Whenever I can, I sit and listen to music with no distractions. I take more walks when I can and don’t feel at all bad if I abandon a TV show.

Using Dailyio has been a super interesting experiment that I will continue to do so that I can keep tracking trends and doing what I can to even out my daily mood. Just using what I learned in the first few months I was able to change habits and routines so that I placed an emphasis on doing the things that truly led to better outcomes each day. There are things I assumed were beneficial to me that turned out to be incorrect and removing them from my life didn’t have the impact I thought they would.

If you are at all interested in this sort of thing I highly recommend finding an app, either Dailyio or something like it, that allows you to tracking and report on what affects your mood from day to day.

One of the requirements when setting up a Mastodon instance is that you are able to send outgoing email. If you are running a personal instance you easily get away with running something like Mailhog which will simply capture all emails being sent and present it to you in a nice web interface. While setting up my personal Mastodon instance I decided to setup a real smarthost/relay for my k3s cluster. I did this using Postfix configured to route mail through a smarthost. Search the web for details on how to do this, there are a lot of how-tos out there explaining the process if you are not familiar with it.

In the past I would have used my gmail account as my smtp relay. Earlier in 2022, Gmail removed the ability to do this so I needed to find a replacement. I ended up settling on https://www.sendinblue.com because they offer a free tier that allows for 300 emails per day. Since everything I do in my cluster is personal there is no way I’ll ever hit that limit. Even if I were to hit the limit I don’t mind if the email messages simply stop working until the next day. I found setup to be easy. Simply create an account (giving them a bit of information) and then visiting the SMTP & API page, clicking SMTP and getting my credentials to put into Postfix.

SendInBlue SMTP/API interface

I am not affiliated SendInBlue in any way, just sharing something I found that allows you to quickly setup an SMTP relay for free.