Berkeley Researchers Build Algorithms to Identify Which LLM Features Actually Matter
# Berkeley Researchers Build Algorithms to Identify Which LLM Features Actually Matter When something goes wrong inside a large language model — a biased output, a nonsensical answer, a safety failure — the natural question is: *why?* Which specific combination of inputs, training examples, or i...

Berkeley Researchers Build Algorithms to Identify Which LLM Features Actually Matter
When something goes wrong inside a large language model — a biased output, a nonsensical answer, a safety failure — the natural question is: why? Which specific combination of inputs, training examples, or internal components caused the model to behave that way?
That's harder to answer than it sounds. Modern LLMs don't work by ticking through a checklist. They synthesize complex relationships between thousands of input features, and those relationships — the interactions between words, concepts, and model components — are what drive their behavior. Finding those interactions has historically been like searching for a needle in a haystack, except the haystack grows exponentially with every feature you add.
A team from UC Berkeley's Artificial Intelligence Research (BAIR) lab has a new approach. Their algorithms, called SPEX and ProxySPEX, can identify the most influential interactions in an LLM at scale — from dozens of components to thousands.
The core insight is structural: while the number of possible interactions is enormous, the number of influential ones is small. LLMs, like most real-world systems, tend to rely on sparse interactions — a handful of relationships that actually drive any given output. The Berkeley team framed this as a sparse recovery problem, borrowing tools from signal processing and coding theory to isolate those critical relationships without exhaustively testing every possible combination.
"SPEX matches the high faithfulness of existing interaction techniques on short inputs, but uniquely retains this performance as the context scales to thousands of features," according to the team's blog post. "Marginal approaches like LIME and Banzhaf can also operate at this scale, but they exhibit significantly lower faithfulness because they fail to capture the complex interactions driving the model's output."
The follow-up algorithm, ProxySPEX, adds another structural observation: hierarchy. Where a high-order interaction matters, its lower-order subsets likely matter too. This yields a 10x computational improvement — matching SPEX's accuracy with far fewer inference calls.
The team demonstrated the approach on several problems:
The code for both algorithms is available in the SHAP-IQ repository.
Papers: SPEX (ICML 2025), ProxySPEX (NeurIPS 2025)
