Blog - RTG Neuroexplicit Models

Discovering Interpretable Algorithms by Decompiling Transformers to RASP

10 June 2026, by Aleksandra Bakalova | L

Abstract: We propose a method for extracting the algorithms learned by Transformers. We test it on models trained on algorithmic tasks and formal languages. By translating trained models into (D-)RASP programs and simplifying them with circuit discovery, we find that Transformers that generalize well to longer inputs often rely on small, interpretable (D-)RASP programs. This provides direct evidence that such models internally learn simple algorithmic solutions.

Paper: arXiv | Software: GitHub | Keywords: NLP, interpretability, transformers

Learn More

Discover-then-name: Task-agnostic concept bottlenecks via automated concept discovery

16th January 2026, by Sweta Mahajan | V

Abstract: We propose a novel Concept Bottleneck Model (CBM) approach called Discover-then-Name-CBM (DN-CBM) that inverts the typical paradigm of first defining the concepts, then learning them. Instead, we use sparse autoencoders to discover concepts that have been learnt by the model, and then name them accordingly. Our concept extraction strategy is efficient, since it is agnostic to the downstream task, and uses concepts already known to the model; overall resulting in performant and interpretable CBMs.

Paper: ECCV Proceedings | Software: GitHub | Keywords: CV, concept bottleneck models

Learn More

RSA-Control: A Pragmatics-Grounded Lightweight Controllable Text Generation Framework

15th December 2025, by Yifan Wang | L

Abstract: We introduce RSA-Control, a novel framework for controllable text generation (CTG) that does not require additional training and is grounded in principles of pragmatics via Rational Speech Acts (RSA). By employing recursive reasoning between imaginary speakers and listeners, RSA-Control steers large language models (LLMs) to produce text where desired attributes can be better perceived by listeners. This framework exemplifies a fused neuroexplicit approach, where neural models are combined with explicit knowledge in a post-hoc manner.

Paper: aclanthology | Software: GitHub | Keywords: NLP, controlled generation, rational speech act

Learn More

Introducing the Neuroexplicit Models Blog

14th November 2025, by Ji-Ung Lee | A F L V

Abstract: Neuroexplicit models are a type of machine learning model that combines deep learning with explicit AI; allowing them to utilize the generalization capabilities of deep neural models and at the same time, to exploit human-understandable, explicit components. Neurosymbolic models are the most prominent, but by far not the sole kind of neuroexplicit models. In this blog post, we will draw an outline of neuroexplicit models and by doing so, provide a new perspective on taxonomizing the increasing number of AI models.

Learn More