1 min to read
AI Interpretability (xAI) Reading List
(Work-In-Progress) Foundational articles on AI interpretability / explainable AI.
TODO 20241203
: sync w/ Notion database
Large Language Models
-
OpenAI: Extracting Concepts from GPT-4 [blog] [paper] [code]
-
Anthropic: Mapping the Mind of a Large Language Model [blog] [paper] [thread]
Credits
- Cover Image: Google Deepmind on Unsplash [link]
Comments