Lawrence Chan

Cited by

	All	Since 2019
Citations	526	520
h-index	10	10
i10-index	11	11

300

150

225

2019202020212022202320242 9 13 39 296 159

Public access

View all

2 articles

0 articles

available

not available

Based on funding mandates

Co-authors

Neel NandaResearch Engineer, Google DeepMindVerified email at deepmind.com
Anca D DraganAssistant Professor at UC Berkeley // Director, AI Safety and Alignment, Google DeepMindVerified email at berkeley.edu
Jacob SteinhardtStanford UniversityVerified email at cs.stanford.edu
Richard NgoOpenAIVerified email at openai.com
Dylan Hadfield-MenellMassachusetts Institute of TechnologyVerified email at csail.mit.edu
Siddhartha SrinivasaProfessor, University of WashingtonVerified email at cs.washington.edu
Andrew CritchUC Berkeley, Department of Electrical Engineering and Computer SciencesVerified email at eecs.berkeley.edu
Adam ScherlisVerified email at scherlis.com
Daniel M. ZieglerRedwood ResearchVerified email at rdwrs.com
Noa NabeshimaUC Santa BarbaraVerified email at ucsb.edu
Ben Weinstein-RaunPalisade Research, AI ImpactsVerified email at benwr.net
Tim BaumanSurge AIVerified email at surgehq.ai
Rachel FreedmanUC BerkeleyVerified email at berkeley.edu
Rohin ShahResearch Scientist, Google DeepMindVerified email at deepmind.com
Stuart RussellProfessor of Computer Science, University of California, BerkeleyVerified email at cs.berkeley.edu
Michael DennisGoogle DeepMindVerified email at cs.berkeley.edu
Pedro FreireUK Office of CommunicationsVerified email at aston.ac.uk
Dmitrii KrasheninnikovUniversity of CambridgeVerified email at cam.ac.uk
Daniel S. BrownAssistant Professor, Robotics Center and School of Computing, University of UtahVerified email at cs.utah.edu
Euan McLeanFAR AIVerified email at far.ai

Lawrence Chan

PhD Student, UC Berkeley

Verified email at berkeley.edu

AI Alignment Interpretability Reward Learning


Title Sort by citations Sort by year Sort by title	Cited by Cited by	Year
Progress measures for grokking via mechanistic interpretability N Nanda, L Chan, T Liberum, J Smith, J Steinhardt ICLR 2023, 2023	159	2023
The alignment problem from a deep learning perspective R Ngo, L Chan, S Mindermann arXiv preprint arXiv:2209.00626, 2022	97	2022
The assistive multi-armed bandit L Chan, D Hadfield-Menell, S Srinivasa, A Dragan 2019 14th ACM/IEEE International Conference on Human-Robot Interaction (HRI …, 2019	47	2019
A toy model of universality: Reverse engineering how networks learn group operations B Chughtai, L Chan, N Nanda ICML 2023, 2023	43	2023
Adversarial Training for High-Stakes Reliability DM Ziegler, S Nix, L Chan, T Bauman, P Schmidt-Nielsen, T Lin, ... NeurIPS 2022, 2022	36	2022
Causal Scrubbing: a method for rigorously testing interpretability hypotheses L Chan, A Garriga-Alonso, N Goldowsky-Dill, R Greenblatt, ... https://www.alignmentforum.org/posts/JvZhhzycHu2Yd57RN/causal-scrubbing-a …, 2022	28	2022
Benefits of assistance over reward learning R Shah, P Freire, N Alex, R Freedman, D Krasheninnikov, L Chan, ...	22	2020
Human irrationality: both bad and good for reward inference L Chan, A Critch, A Dragan arXiv preprint arXiv:2111.06956, 2021	19	2021
Optimal cost design for model predictive control A Jain, L Chan, DS Brown, AD Dragan Learning for Dynamics and Control, 1205-1217, 2021	17	2021
Evaluating Language-Model Agents on Realistic Autonomous Tasks M Kinniment, LJK Sato, H Du, B Goodrich, M Hasin, L Chan, LH Miles, ... https://evals.alignment.org/Evaluating_LMAs_Realistic_Tasks.pdf, 2023	13	2023
Causal scrubbing, a method for rigorously testing interpretability hypotheses. AI Alignment Forum, 2022 L Chan, A Garriga-Alonso, N Goldwosky-Dill, R Greenblatt, ...	10
The alignment problem from a deep learning perspective. arXiv R Ngo, L Chan, S Mindermann URL: http://arxiv. org/abs/2209.00626, 2023	8	2023
Progress measures for grokking via mechanistic interpretability, January 2023 N Nanda, L Chan, T Lieberum, J Smith, J Steinhardt arXiv preprint arXiv:2301.05217, 0	7
Progress measures for grokking via mechanistic interpretability, 2023 N Nanda, L Chan, T Lieberum, J Smith, J Steinhardt URL https://arxiv. org/abs/2301.05217, 0	5
Neural networks learn representation theory: Reverse engineering how networks perform group operations B Chughtai, L Chan, N Nanda ICLR 2023 Workshop on Physics for Machine Learning, 2023	4	2023
Language models are better than humans at next-token prediction B Shlegeris, F Roger, L Chan, E McLean arXiv preprint arXiv:2212.11281, 2022	4	2022
A study on autonomous hole machining process analysis by reverse engineering of NC programs X Yan, L Chan, K Yamazaki, J Liu, M Kubota, Y Amano SAE transactions, 1045-1051, 1999	4	1999
The Alignment Problem from a Deep Learning Perspective: A Position Paper R Ngo, L Chan, S Mindermann The Twelfth International Conference on Learning Representations, 2023	1	2023
The impacts of known and unknown demonstrator irrationality on reward inference L Chan, A Critch, A Dragan	1	2020
Autonomous machining process analyzer LC Chan University of California, Davis, 1998	1	1998

The system can't perform the operation now. Try again later.

Articles 1–20

Citations per year

Duplicate citations

Merged citations

Add co-authorsCo-authors

Follow

Cited by

Co-authors