Advanced AI Alignment & Safety InterpretabilityCircuitsFeatures

Mechanistic Interpretability Vocabulary

5 exercises — Learn mechanistic interpretability vocabulary: circuits in neural networks, superposition, activation patching, probing classifiers, and sparse autoencoders.

0 / 5 completed
1 / 5
In mechanistic interpretability, what are circuits in neural networks?