Mending what's broken, or how I discovered some broken links on my website
- Pretty permalinks for this post:
- https://fireburn.ru/posts/mending-broken-permalinks-part-1
It was a restless night. I just fixed my personal website's software, Kittybox, so it wouldn't hang after the first few hours of working (I hope my bugfix finally worked! I've been chasing that bug for months). In an attempt to stimulate my bored brain I was reading some articles on the IndieWeb wikiΒ and stumbled upon discussions that my posts on this very website sparked.
Except the links didn't work. Argh, I can't read my own posts, this won't do!
Of course, this was all my fault. But, thanks to past me having a lot of foresight, this should be rather easy to fix. First thing I decided to do was to patch articles containing links to my posts so the links would actually point to something - despite me wiping my website's main feed, all of the articles are still there because I know some people in the IndieWeb community have been linking to me. The second thing I did not do yet, and if I don't write it down, I might forget it all, so here's the post so I don't forget to do it.
Preserving compatibility
When I was designing the first versions of Kittybox (back then it didn't even had the name!) I was very inspired by 00dani.me's proposed approach of storing MF2-JSON directly in the database. First I stored it in flat files. Then, as I realized my file storage is incredibly slow, I migrated to Redis to keep my dataset in memory. Then several rewrites happened, and the Rust rewrite happened. The database's underlying format was still almost the same, so it was rather easy to port it over. Except I might've forgotten an important step.
In Kittybox, one post can contain several links. One of them is a UID - an authoritative link to the post, the one that Kittybox would use as a primary key in an SQL database if it used one. But since I use the filesystem as my database, the authoritative link gets transformed into an authoritative path which refers directly to the file with the MF2-JSON blob. All other links are supposed to be symlinks.
Except apparently when I was importing the posts in the new file storage backend, I somehow forgot to make all of those symlinks. And that's why the posts don't work.
Past Vika's foresight
Storing the posts in MF2-JSON as processed Micropub data was actually a very good idea. Pretty much all versions of Kittybox filled in alternative URLs (which would be u-url in MF2-HTML and .properties.url[] in MF2-JSON) which actually can be used to restore those permalinks!
Eventually permalink checking will need to be built into the software itself as a consistency check. I could probably write it like this:
use futures::stream::StreamExt;
let urls = json["properties"]["url"]
.as_array()
.unwrap_or_default()
.into_iter()
.filter_map(|s: serde_json::Value| s.as_str().ok())
.map(|url| url_to_path(&self.root_dir, url);
let mut url_stream = tokio_stream::iter(urls);
url_stream.for_each_concurrent(2, |path| async move {
if let Err(err) = tokio::fs::symlink_metadata(path).await {
if err.kind() == std::io::ErrorKind::NotFound {
let link = url_to_path(&self.root_dir, url);
let basedir = link.parent();
if basedir.is_err() {
warn!("Database consistency check: couldn't calculate parent for {}", link);
return;
}
let basedir = basedir.unwrap();
let relative = path_relative_from(&canonical_path, basedir).unwrap();
if let Err(Err) = tokio::fs::symlink(relative, link).await {
warn!("Database consistency check: failed to restore symlink {} for {}: {:?}", canonical, path);
}
}
}
});
I'm pretty sure this won't compile out of the box because of borrow checking and variables which need to be bound, but you get the idea. It can be performed on every read (for maximum correctness) or there could be scheduled tasks that sweep the database and perform those consistency checks - I think the second one is the better idea, since it allows me to check even posts that were completely forgotten by everyone. But I'll need to learn how to schedule tasks to run at certain intervals in Tokio - I'm sure there's a function for that though.
When will it be complete?
I hope it will be built into the software soon. For now, I will leave this as a to-do of sorts. A reminder to myself and an example for the others - of both my foresight and my mistakes.
And for now, I could potentially build a script that recursively walks my directory tree and restores symlinks via a cron job. It'll probably work just as well too. But I'll do it later. I wanna sleep...and coffeeeee....