PRISM

Perspectives on Interpretability in Sciences and ML

A reading club systematically exploring the similarities and differences between neuroscience and machine learning interpretability.

Harvard School of Engineering and Applied Science & The Kempner Institute · Started Fall 2025

Upcoming Event
Panel Discussion · S10 · March 31, 2026

The Nuts and Bolts of Understanding: Neurons, Circuits, Features, and Manifolds

Tuesday, March 31, 2026 · 3:00–4:00 PM ET · SEC 6.301–6.302

A panel discussion bringing together neuroscientists and AI researchers to explore the fundamental units of neural computation — and what interpretability means across these levels of description.

Agenda: Opening — what are the units of computation?  ·  Component deep-dives  ·  Bridging across components and fields  ·  Looking ahead into the future

Ilenna Jones
Ilenna Jones
Neuro
William Dorrell
William Dorrell
Neuro
Andy Keller
Andy Keller
AI
Naomi Saphra
Naomi Saphra
AI
Most Recent Event
Paper Presentation · S9 · March 24, 2026
Hadas Orgad

Large Language Models Generate Harmful Content Using a Distinct, Unified Mechanism

Hadas Orgad (Kempner Fellow)
March 24, 2026 · SEC 2.122

A presentation on how harmful behavior in LLMs relies on a compact set of internal weights, helping explain why safety safeguards are brittle and why narrow fine-tuning can trigger broad misalignment.

How to Participate

PRISM sessions are open to all — graduate students, postdocs, undergraduates, and faculty across departments.

In Person
SEC, Harvard University, Cambridge MA
Tuesdays, 3:00–4:00 PM ET
Light refreshments provided.
Online
Zoom link available for each session.
Current recurring Zoom link →

About PRISM

Perspectives on Interpretability in Sciences and ML

Mission

To systematically explore the similarities and differences between neuroscience and machine learning interpretability.

Interpretability is a rapidly growing area in both AI research and neuroscience, aiming to understand how neural networks — artificial and biological — represent and process information. PRISM brings together researchers and students working on diverse approaches to this shared problem, from geometric structure in neural network latent spaces to tools for interpreting large models and understanding biological neural circuits.

This Semester · Spring 2026

Spring 2026 sessions take the form of panel discussions and paper readings, organized around three interrelated themes:

Spring 2026 Focus Areas

  • Expectations and applications — what counts as an explanation, and where is interpretability useful?
  • Terminology and definitions — do neuroscience and ML interpretability share a common language?
  • Methods — how do the tools used in each field compare, and can they inform one another?
Fall 2025 Overview

Our inaugural semester featured talks and discussions across four core themes:

  • Geometry of representations in neural networks
  • Methods for understanding and shaping models (circuits, SAEs, LoRA geometry)
  • Applications of interpretable models in the sciences
  • Brain–model alignment

Speakers included postdocs, graduate students, and undergraduates from SEAS, MCB, and affiliated institutes including IAIFI and the Kempner Institute.

Format

Sessions run weekly on Tuesdays from 3–4 PM, with 30 minutes of presentation followed by 30 minutes of open discussion. Each session pairs an ML interpretability researcher with a neuroscience or science researcher. Sessions are held in SEC 6.242 at the Kempner Institute, with a Zoom option available. Light refreshments are provided.

Members are welcome to suggest papers for future sessions.

Long-Term Goals
  • A review paper clarifying transferable insights between geometry of representations in neuroscience and AI
  • A future workshop at a neuro/ML conference
  • A shared taxonomy of interpretability methods grounded in underlying hypotheses about representation structure
Sponsor

PRISM is sponsored by the Kempner Institute for the Study of Natural and Artificial Intelligence at Harvard University. The Kempner Institute supports interdisciplinary research at the intersection of natural and artificial intelligence.

Organizers

The people behind PRISM

Current Organizers

PRISM is organized by researchers across departments at Harvard. We welcome others who wish to get involved in organizing future sessions.

Shubham Choudhary

Shubham Choudhary

Co-Founder · PhD Candidate, Electrical Engineering
Biologically plausible models and structure in representations.

Learn more →
Sumedh Hindupur

Sumedh Hindupur

Co-Founder · PhD Candidate, Applied Mathematics
Mechanistic interpretability and geometry of representations.

Learn more →
Demba Ba

Demba Ba

Host PI · Professor, Electrical Engineering
Reverse engineering intelligence: both artificial and biological.

Learn more →