Getting into Interpretability - A potential all consuming side-project

Things are changing very fast! Bear with me for a seemingly unreleated intoduction paragraph.

I love mathematics and scientific computing, and that’s what I do at work. But lately things are changing rapidly. I find myself being the orchestrator for implementing research ideas and LLMs do the job of implementing it for me. To be honest, I feel like I’ve lost the fun in coding. Until now coding felt like a craft that was fun to practice. With AI coding, I’ve lost that fun. I can’t afford not to use it. Its just very good and I can’t be left behind others who do use it. I guess I need a mindset shift to realize that the craft now is to learn things at an un-precedented pace. Previously, learning anything took time - searching for proper resources to teach and answer my questions. Now, as a colleague of mine put, I have a “Query language for Curiosity”. It’s no longer fun to type things into the computer. What is fun is to consume knowledge, draft sound ideas and get LLMs to implement them (while tracking and understanding why and how they are implementing it that way). Learning coding concepts is still on the cards, but learning syntax - That’s useless. The only way to satisfy myself is to understand things deeply.

By “Understand things deeply”, I mean be able to explain what I created at multiple resolutions. From a high level understanding to the intricacies. Explainability is the value right now, because AI can do the implementation. If we take the “Understand things deeply” from this context to the extreme, what do we end up with? For me, it felt obvious that the ultimate quest would be to explain the tool itslef! It’s kind of weird. Usually a tool is the most explainable thing. A hammer is a simple object. You then build complexity with it. But when it comes to AI, its the other way. The tool is the least understood thing.

That quest to understand AI seems like a worthy one to embark on for the forseeable future, while AI slowly eats into the value that I was supposed to bring and takes away the fun in doing things that I do today.

So when I asked myself:

What is something I can pick up that allows me to combine my love for mathematics and interdisciplinary science?
What is something that is fresh enough that I as an independent guy working after hours can contribute meaningfully to?
What is something that will stay fresh for the foreseeable future and won’t be consumed by AI?
What is something that genuinely feels interesting and I can pursue realistically given my background and resources?

I figured its AI Interpretability research for me.

I’ll journal my journey on this blog. I’ve also noticed that I fail to post often on this blog with the false sense of perfection. I hope to start changing that from today. This interpretability journey, and any fun thoughts and learning I feel are worth journalling will henceforth be pushed to the blog with minimal editing. The goal is to be consistent and not perfect. If anyone is reading this, catch you in the next one.

Enjoy Reading This Article?