Anthropic Open-sources Tool to Trace the "Thoughts" of Large Language Models

Wait 5 sec.

Anthropic researchers have open-sourced the tool they used to trace what goes on inside a large language model during inference. It includes a circuit tracing Python library that can be used with any open-weights model and a frontend hosted on Neuropedia to explore the library output through a graph. By Sergio De Simone