DFJ Growth

Randy Glein, Kevin Tu, Maxim Sirenko

Goodfire: Interpreting the AI Black Box

AI has crossed the chasm, quickly becoming an indispensable part of life for a billion people globally. The emergence of generative AI has been characterized by rapid adoption of chatbots like ChatGPT and Grok for comprehensive access to knowledge, data, and answers. This powerful capability has been unleashed through the rise of large language models (LLMs), which are trained on massive data sets and programmed to contextualize that data and provide smart insights. Despite being developed by humans, foundation models exhibit a very peculiar trait – their decision-making process is often opaque and mysterious, even to their creators.

This “black box” phenomenon creates a trust gap, where model outputs are constantly scrutinized and questioned. A model can work well in a demo, then fail in production in ways that are hard to predict or fix. Models have been shown to hallucinate citations, products, and even court cases that don’t exist. When asked to correct programming issues, some models edit the test case to “pass” instead of fixing the bug in the underlying code. When this rogue behavior occurs, the root cause is usually invisible. There is little transparency, so we are forced to tweak prompts and add guardrails, nudging the model from the outside and hoping the inside cooperates.

At DFJ Growth, we have been searching for a solution to this model interpretability problem, seeing it as a bottleneck to unlocking the behavior we expect from AI. When we first met Eric Ho, founder and CEO of Goodfire, we sensed that we were on the right track. Eric and the Goodfire team have created a new AI lab to address this challenge head-on, designing models that can be understood, debugged, and shaped to enable the safe and transparent use of AI in society.

The “read/write” moment for AI
There is a pattern to how fields mature. They start as something closer to craft than engineering: lots of intuition, heuristics, and experimentation. A breakthrough happens when we gain the ability to read the underlying system and write it with precision. Biology didn’t become “engineering-like” until sequencing made DNA legible and tools like CRISPR made it editable. Software didn’t become scalable until we had debuggers, compilers, and version control. AI is now at its own read/write inflection point.

The field of modern neural network interpretability is focused on reverse engineering advanced AI models to explain how and why models produce specific behaviors. Rather than evaluating models solely from the outside, observing inputs and the resulting outputs, interpretability looks inside the model to unpack its neurons. In practice, this allows developers to diagnose why a model hallucinates or exhibits bias, fix failure modes surgically, and enforce policies and constraints. Once those mechanisms are visible, they can be monitored, tuned, or removed.

Why Goodfire?
Goodfire is building the control layer that makes modern AI systems engineerable. Their platform has demonstrated the applicability of interpretability across a range of fields. From identifying a new class of biomarkers for Alzheimer’s detection in the life sciences, to producing highly targeted and efficient model tuning in financial services, Goodfire is driving value in high-stakes domains where explainability and controllability are critical.

This progress toward broad-based model interpretability is made possible with a clear mission that attracts world-class talent. Goodfire has become the center of gravity for interpretability, attracting leading researchers including Tom McGrath, Lee Sharkey, Nick Cammarata, and others from DeepMind, OpenAI, Stanford, Harvard, and Apollo Research. Cultivating a talent-dense interpretability research team has positioned Goodfire to usher in these critical innovations for foundation models.

A more transparent future for AI
If the black box of AI models can be understood, and therefore steered, we will enter a new era for generative AI. Interpretability can unlock the AI development lifecycle, allowing for debugging, testing, monitoring, versioning, and patching. It can serve as a control layer for faster model iteration and catalyze AI adoption in mission-critical enterprise applications and society at large.

We believe humanity cannot control what it does not understand, which is why Goodfire’s mission is of utmost importance. Interpretability can help provide control over AI models and can make AI development predictable and reliable, a prerequisite for safe and scalable AI.