I have decided to block all AI crawlers from this blog

I have decided, albeit with a heavy heart, to block all artificial intelligence crawlers from accessing and therefore in turn learning from my blog. I did this for moral reasons, and it was super easy to implement thanks to a simple switch that is provided by Cloudflare, through whom I’m hosting this blog through their Pages product.

AI is all around us

“Artificial intelligence” or it’s predecessors and cousins has been around for a long time. Artificial intelligence is even useful for many applications. Having my phone surface me the apps I’d like to use next is a really cool use case. And occasionally, it can feel almost magic¹. When OpenAI first opened up ChatGPT to the public, I was very much impressed by its writing and playing around with the AI image generators was fun and sometimes still is. But somehow that magic has worn off pretty quickly. While ChatGPT is probably much better at writing than I am, but for topics where I have expert knowledge, I often find inaccuracies and problems with its responses.

Even if I don’t always like it, we are being confronted with more and more AI every day. And for some applications this is actually useful. But I think we need to be cautious in finding out where to draw the line. Do I think a chatbot for a company’s product that has been trained on the handbook can be useful? Yes, of course, at least once that was very helpful². Do I think that machine translation is useful? Yes, it has been very useful to me multiple times. I have been using Full Line code completion from IntelliJ at work quite often and successfully.

What I’m trying to say with this, I’m not afraid of AI, I think it has its applications. But the current way that is developing, especially in venture funded Silicon Valley, is not going in a good why, at least not what I consider a good way.

Why I’m blocking the AI crawlers

Other people have written many words about why it makes sense to block artificial intelligence bots. They are also much better than me at writing, and their words make much more sense. Somehow to me, it just doesn’t feel right, the way it is right now. And this is after all my blog, so I can take action when I want.

I’m generally optimistic about technology and I think that technology will actually help people, make their lives easier and generally benefit a large part of the population. But somehow, with the entire hype around artificial intelligence, it appears that this will not be useful for everyone across the board. PJ Vogt has done some wonderful reporting on his podcast search engine. For me, it feels like the balance is not adding up, not across the board. Artificial intelligence companies want to use my writing³, not reference me in any way, and don’t give me anything in return. This balance feels off to me and seems like this is not the way to go. Further, I think many of these companies are first training their models and then later offering an option to opt-out, which just seems wrong to me. In fact, I haven’t even properly made up my mind on the opt-in vs opt-out debate.

Look, my blog is free to read, anyone with a browser can access it and read it. In fact, it is probably quite discoverable through search engines such as google. In case you would like to publish my content some place else⁴, just message me and chances are I will license it to you for free, I like sharing my ideas. But somehow a gigantic cooperation with a bunch of money should not use my stuff for their profit. Easy as that.

This blog’s mantra is slowing becoming “AI is not magic, but it can be useful”. ↩︎
I was trying to find out why a multi paged RSS feed was not properly working in Pocketcasts. I was googling around and only found an old twitter post where the company said that they were actually parsing the multipage RSS feeds. I asked their smart assistant on their support page, and it informed me that this was removed in some major version recently. I did not find that information anywhere else, so that was very useful. ↩︎
I’ll be the first to mention that using my writing for training your model might not actually improve your model and my writing is by no means of high quality (that’s not the reason I’m writing this blog). Still, I think there is a permission issue here, and I should be controlling what happens with my writing here. ↩︎
Highly doubt that use case, I mean why would you, there is so much better writing around? ↩︎

Tags: Blog, Old Web, Technical