This library implements tools for finding circuits using features from (cross-layer) MLP transcoders, as originally introduced by Ameisen et al. (2025) and Lindsey et al. (2025).
Our library performs three main tasks.
- Given a model with pre-trained transcoders, it finds the circuit / attribution graph; i.e., it computes the direct effect that each non-zero transcoder feature, transcoder error node, and input token has on each other non-zero transcoder feature and output logit.
- Given an attribution graph, it visualizes this graph and allows you to annotate these features.
- Enables interventions on a model's transcoder features using the insights gained from the attribution graph; i.e. you can set features to arbitrary values, and observe how model output changes.
One quick way to start is to try our tutorial notebook!
You can also find circuits and visualize them in one of three ways:
- Use
circuit-tracer
on Neuronpedia - no installation required! Just click on+ New Graph
to create your own, or use the drop-down menu to select an existing graph. - Run
circuit-tracer
via a Python script or Jupyter notebook. Start with our tutorial notebook. This will work on Colab with the GPU resources provided for free by default - just click on the Colab badge! Check out the Demos section below for more tutorials. You can also run these demo notebooks locally, with your own compute. - Run
circuit-tracer
via the command-line interface. This can only be done with your own compute. For more on how to do that, see Command-Line Interface.
Working with Gemma-2 (2B) is possible with relatively limited GPU resources; Colab GPUs have 15GB of RAM. More GPU RAM will allow you to do less offloading, and to use a larger batch size.
Currently, intervening on models with respect to the transcoder features you discover in your graphs is only possible when using circuit-tracer
in a script or notebook, not on Neuronpedia.
To install this library, clone it and run the command pip install .
in its directory.
We include some demos showing how to use our library in the demos
folder. The main demo is demos/circuit_tracing_tutorial.ipynb
, which replicates two of the findings from this paper using Gemma 2 (2B). All demos except for the Llama demo can be run on Colab.
We also make two simple demos of attribution and intervention available, for those who want to learn more about how to use the library:
demos/attribute_demo.ipynb
: Demonstrates how to find circuits and visualize them.demos/intervention_demo.ipynb
: Demonstrates how to perform interventions on models.
We finally provide demos that dig deeper into specific, pre-computed and pre-annotated attribution graphs, performing interventions to demonstrate the correctness of the annotated graph:
demos/gemma_demo.ipynb
: Explores graphs from Gemma 2 (2B).demos/gemma_it_demo.ipynb
: Explores graphs from instruction-tuned Gemma 2 (2B), using transcoders from the base model.demos/llama_demo.ipynb
: Explores graphs from Llama 3.2 (1B). Not supported on Colab.
We also provide a number of annotated attribution graphs for both models, which can be found at the top of their two demo notebooks.
The unified CLI performs the complete 3-step process for finding and visualizing circuits:
- Attribution: Runs the attribution algorithm to find the circuit/attribution graph, computing direct effects between transcoder features, error nodes, tokens, and output logits.
- Graph File Creation: Prunes the attribution graph to remove low-effect nodes and edges, then converts it to JSON format suitable for visualization.
- Local Server: Starts a local web server to visualize and interact with the graph in your browser.
To find a circuit, create the graph files, and start up a local server, use the command:
circuit-tracer attribute --prompt [prompt] --transcoder_set [transcoder_set] --slug [slug] --graph_file_dir [directory] --slug [slug] --graph_file_dir [graph_file_dir] --server
It will tell you where the server is serving (something like localhost:[port]
). If you run this command on a remote machine, make sure to enable port forwarding, so you can see the graphs locally!
Attribution
--prompt
(-p
): The input prompt to analyze--transcoder_set
(-t
): The set of transcoders to use for attribution. Available presets:gemma
: transcoders forgoogle/gemma-2-2b
from GemmaScope (loads lowest-L0 transcoder per layer). Link to transcodersllama
: transcoders formeta-llama/Llama-3.2-1B
(trained by us). Link to transcoders- Or path to a custom config file (see
src/circuit_tracer/configs
for examples, but note that full support for new transcoders is coming soon.)
Graph File Creation
These are required if you want to run a local web server for visualization:
--slug
: A name/identifier for your analysis run--graph_file_dir
: Directory path where JSON graph files will be saved
You can also save the raw attribution graph (to be loaded and used in Python later):
--graph_output_path
(-o
): Path to save the raw attribution graph (.pt
file)
You must set --slug
and --graph_file_dir
, or --graph_output_path
, or both! Otherwise the CLI will output nothing.
Local Server
--server
: Start a local web server for graph visualization
Attribution Parameters:
--model
(-m
): Model architecture (auto-inferred forgemma
andllama
presets)--max_n_logits
(default: 10): Maximum number of logit nodes to attribute from--desired_logit_prob
(default: 0.95): Cumulative probability threshold for top logits--batch_size
(default: 256): Batch size for backward passes--max_feature_nodes
: Maximum number of feature nodes (defaults to all nodes)--offload
: Memory optimization option (cpu
,disk
, orNone
)--verbose
: Display detailed progress information
Graph Pruning Parameters:
--node_threshold
(default: 0.8): Keeps minimum nodes with cumulative influence ≥ threshold--edge_threshold
(default: 0.98): Keeps minimum edges with cumulative influence ≥ threshold
Server Parameters:
--port
(default: 8041): Port for the local server
Complete workflow with visualization:
circuit-tracer attribute \
--prompt "The International Advanced Security Group (IAS" \
--transcoder_set gemma \
--slug gemma-demo \
--graph_file_dir ./graph_files \
--server
Attribution only (save raw graph):
circuit-tracer attribute \
--prompt "The capital of France is" \
--transcoder_set llama \
--graph_output_path france_capital.pt
When using the --server
option, your browser will open to a local visualization interface. The interface is the same as in the original papers (frontend available here).
- Select a node: Click on a node.
- Pin / unpin a node to subgraph pane: Ctrl+click/Commmand+click the node.
- Annotate a node: Click on the "Edit" button on the right side of the window while a node is selected to edit its annotation.
- Group nodes: Hold G and click on nodes to group them together into a supernode. Hold G and click on the x next to a supernode to ungroup all of them.
- Annotate supernode / node group: click on the label below the supernode to edit the supernode annotation.
You can cite this library as follows:
@misc{circuit-tracer,
author = {Hanna, Michael and Piotrowski, Mateusz and Lindsey, Jack and Ameisen, Emmanuel},
title = {circuit-tracer},
howpublished = {\url{https://github.com/safety-research/circuit-tracer}},
note = {The first two authors contributed equally and are listed alphabetically.},
year = {2025}
}