Skip to content

safety-research/circuit-tracer

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

circuit-tracer

This library implements tools for finding circuits using features from (cross-layer) MLP transcoders, as originally introduced by Ameisen et al. (2025) and Lindsey et al. (2025).

Our library performs three main tasks.

  1. Given a model with pre-trained transcoders, it finds the circuit / attribution graph; i.e., it computes the direct effect that each non-zero transcoder feature, transcoder error node, and input token has on each other non-zero transcoder feature and output logit.
  2. Given an attribution graph, it visualizes this graph and allows you to annotate these features.
  3. Enables interventions on a model's transcoder features using the insights gained from the attribution graph; i.e. you can set features to arbitrary values, and observe how model output changes.

Usage

One quick way to start is to try our tutorial notebook!

You can also find circuits and visualize them in one of three ways:

  1. Use circuit-tracer on Neuronpedia - no installation required! Just click on + New Graph to create your own, or use the drop-down menu to select an existing graph.
  2. Run circuit-tracer via a Python script or Jupyter notebook. Start with our tutorial notebook. This will work on Colab with the GPU resources provided for free by default - just click on the Colab badge! Check out the Demos section below for more tutorials. You can also run these demo notebooks locally, with your own compute.
  3. Run circuit-tracer via the command-line interface. This can only be done with your own compute. For more on how to do that, see Command-Line Interface.

Working with Gemma-2 (2B) is possible with relatively limited GPU resources; Colab GPUs have 15GB of RAM. More GPU RAM will allow you to do less offloading, and to use a larger batch size.

Currently, intervening on models with respect to the transcoder features you discover in your graphs is only possible when using circuit-tracer in a script or notebook, not on Neuronpedia.

Installation

To install this library, clone it and run the command pip install . in its directory.

Demos

We include some demos showing how to use our library in the demos folder. The main demo is demos/circuit_tracing_tutorial.ipynb, which replicates two of the findings from this paper using Gemma 2 (2B). All demos except for the Llama demo can be run on Colab.

We also make two simple demos of attribution and intervention available, for those who want to learn more about how to use the library:

We finally provide demos that dig deeper into specific, pre-computed and pre-annotated attribution graphs, performing interventions to demonstrate the correctness of the annotated graph:

We also provide a number of annotated attribution graphs for both models, which can be found at the top of their two demo notebooks.

Command-Line Interface

The unified CLI performs the complete 3-step process for finding and visualizing circuits:

3-Step Process

  1. Attribution: Runs the attribution algorithm to find the circuit/attribution graph, computing direct effects between transcoder features, error nodes, tokens, and output logits.
  2. Graph File Creation: Prunes the attribution graph to remove low-effect nodes and edges, then converts it to JSON format suitable for visualization.
  3. Local Server: Starts a local web server to visualize and interact with the graph in your browser.

Basic Usage

To find a circuit, create the graph files, and start up a local server, use the command:

circuit-tracer attribute --prompt [prompt] --transcoder_set [transcoder_set] --slug [slug] --graph_file_dir [directory] --slug [slug] --graph_file_dir [graph_file_dir] --server

It will tell you where the server is serving (something like localhost:[port]). If you run this command on a remote machine, make sure to enable port forwarding, so you can see the graphs locally!

Mandatory Arguments

Attribution

  • --prompt (-p): The input prompt to analyze
  • --transcoder_set (-t): The set of transcoders to use for attribution. Available presets:
    • gemma: transcoders for google/gemma-2-2b from GemmaScope (loads lowest-L0 transcoder per layer). Link to transcoders
    • llama: transcoders for meta-llama/Llama-3.2-1B (trained by us). Link to transcoders
    • Or path to a custom config file (see src/circuit_tracer/configs for examples, but note that full support for new transcoders is coming soon.)

Graph File Creation

These are required if you want to run a local web server for visualization:

  • --slug: A name/identifier for your analysis run
  • --graph_file_dir: Directory path where JSON graph files will be saved

You can also save the raw attribution graph (to be loaded and used in Python later):

  • --graph_output_path (-o): Path to save the raw attribution graph (.pt file)

You must set --slug and --graph_file_dir, or --graph_output_path, or both! Otherwise the CLI will output nothing.

Local Server

  • --server: Start a local web server for graph visualization

Optional Arguments

Attribution Parameters:

  • --model (-m): Model architecture (auto-inferred for gemma and llama presets)
  • --max_n_logits (default: 10): Maximum number of logit nodes to attribute from
  • --desired_logit_prob (default: 0.95): Cumulative probability threshold for top logits
  • --batch_size (default: 256): Batch size for backward passes
  • --max_feature_nodes: Maximum number of feature nodes (defaults to all nodes)
  • --offload: Memory optimization option (cpu, disk, or None)
  • --verbose: Display detailed progress information

Graph Pruning Parameters:

  • --node_threshold (default: 0.8): Keeps minimum nodes with cumulative influence ≥ threshold
  • --edge_threshold (default: 0.98): Keeps minimum edges with cumulative influence ≥ threshold

Server Parameters:

  • --port (default: 8041): Port for the local server

Examples

Complete workflow with visualization:

circuit-tracer attribute \
  --prompt "The International Advanced Security Group (IAS" \
  --transcoder_set gemma \
  --slug gemma-demo \
  --graph_file_dir ./graph_files \
  --server

Attribution only (save raw graph):

circuit-tracer attribute \
  --prompt "The capital of France is" \
  --transcoder_set llama \
  --graph_output_path france_capital.pt

Graph Annotation

When using the --server option, your browser will open to a local visualization interface. The interface is the same as in the original papers (frontend available here).

  • Select a node: Click on a node.
  • Pin / unpin a node to subgraph pane: Ctrl+click/Commmand+click the node.
  • Annotate a node: Click on the "Edit" button on the right side of the window while a node is selected to edit its annotation.
  • Group nodes: Hold G and click on nodes to group them together into a supernode. Hold G and click on the x next to a supernode to ungroup all of them.
  • Annotate supernode / node group: click on the label below the supernode to edit the supernode annotation.

Cite

You can cite this library as follows:

@misc{circuit-tracer,
  author = {Hanna, Michael and Piotrowski, Mateusz and Lindsey, Jack and Ameisen, Emmanuel},
  title = {circuit-tracer},
  howpublished = {\url{https://github.com/safety-research/circuit-tracer}},
  note = {The first two authors contributed equally and are listed alphabetically.},
  year = {2025}
}

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published