I think that an important part of keeping the nice things of the internet, is to have small free spaces, and avoid bringing everything under the control of a few corporations. That is also one of the reasons that I started my blog again, and I take care of the technical part myself. Having a static site is about using simple future proof technologies so that it can be kept around with little effort, and migrate to various hosting options. I think that jekyll and the minimal mistakes jekyll theme worked very well. They did need some work to set up, but just worked, and allow one to have basically zero data collection/privacy issues.
I also wanted a way to enable some discussion on the site, i.e. comments. I was a bit wary of spam problems, but I wanted to see how it would go. Spam is one of the things that makes an open internet a challenge. Moderation, and especially moderation at scale is difficult, and one of the reasons to avoid having just a few players doing it: there are bound to have bad decisions, having several players is the best way to let a way out. Decisions on copyright in the EU made it more difficult to give free space to anonymous persons.
Still, I wanted to let a way for people to give feedback. Unfortunately, spam did become a nuisance, I had always 5-10 spam messages a day, not totally overwhelming, but a big nuisance, especially as I do not want to dedicate much time to this, and a few skipped days quickly add up. Luckily, others fight this same issue.
Google provides reCAPTCHA, to help identifying humans (and avoid bots). This is nice, and google does not do it out of pure altruism, it gets annotated images for its AI effort, so it is likely to continue providing it.
I was very happy that my site did basically no tracking, using reCAPTCHA changes that, and I did not want to force it on the bulk of the users that do not add comments, but would possibly be tracked through it by Google. It is not fully clear when the tracking starts, and I did not want to trust google when not needed. Thus, I modified the template to hide the form to add a comment, and added an “add a comment” button that shows the form and loads the reCAPTCHA script only when clicked. This way any reCAPTCHA related tracking cannot start before the use expresses the intention to add a comment.
Doing it, I saw that the template did provide not just siteKey of reCAPTCHA, but also the private secret. Looking at the staticman code I saw that indeed it requires it. The situation the situation is not as bad as one might initially think: the secret isn’t in plain text, but is encrypted with staticman’s rsa key. Still, I saw no point in having to give the encrypted secret in the form, it just makes it easy for a would be exploiter to get it. So I submitted a patch to avoid sharing it (and be a bit more verbose on the failures/censoring).
The other approach to avoid spam is to rely on a site that aggregates all posts and tries to then keep up to date filters that block it. This comes from the fact that a spam message (of at least the links they try to share), have already been attempted on some other site, and thus filtering based on seeing large amounts of posts works. Akismet.com uses this approach, so I set it up, but I also added log messages describing the blocked posts to be able to evaluate if it works well or not.
After these changes my spam went back to zero, I am very glad of it, and I am glad that I could keep my comments, even if they aren’t really used currently. The solution worked, but it is still sad that one has to resort to these things to have a working site with comments. I understand why some communities/forum went underground, removing public/open links to themselves, to keep a community that was working healthy. Still, I think that there is value in making the effort of keeping things open. Time will tell…