Follow
Lawrence Chan
Lawrence Chan
PhD Student, UC Berkeley
Verified email at berkeley.edu
Title
Cited by
Cited by
Year
Progress measures for grokking via mechanistic interpretability
N Nanda, L Chan, T Liberum, J Smith, J Steinhardt
ICLR 2023, 2023
1592023
The alignment problem from a deep learning perspective
R Ngo, L Chan, S Mindermann
arXiv preprint arXiv:2209.00626, 2022
972022
The assistive multi-armed bandit
L Chan, D Hadfield-Menell, S Srinivasa, A Dragan
2019 14th ACM/IEEE International Conference on Human-Robot Interaction (HRI …, 2019
472019
A toy model of universality: Reverse engineering how networks learn group operations
B Chughtai, L Chan, N Nanda
ICML 2023, 2023
432023
Adversarial Training for High-Stakes Reliability
DM Ziegler, S Nix, L Chan, T Bauman, P Schmidt-Nielsen, T Lin, ...
NeurIPS 2022, 2022
362022
Causal Scrubbing: a method for rigorously testing interpretability hypotheses
L Chan, A Garriga-Alonso, N Goldowsky-Dill, R Greenblatt, ...
https://www.alignmentforum.org/posts/JvZhhzycHu2Yd57RN/causal-scrubbing-a …, 2022
282022
Benefits of assistance over reward learning
R Shah, P Freire, N Alex, R Freedman, D Krasheninnikov, L Chan, ...
222020
Human irrationality: both bad and good for reward inference
L Chan, A Critch, A Dragan
arXiv preprint arXiv:2111.06956, 2021
192021
Optimal cost design for model predictive control
A Jain, L Chan, DS Brown, AD Dragan
Learning for Dynamics and Control, 1205-1217, 2021
172021
Evaluating Language-Model Agents on Realistic Autonomous Tasks
M Kinniment, LJK Sato, H Du, B Goodrich, M Hasin, L Chan, LH Miles, ...
https://evals.alignment.org/Evaluating_LMAs_Realistic_Tasks.pdf, 2023
132023
Causal scrubbing, a method for rigorously testing interpretability hypotheses. AI Alignment Forum, 2022
L Chan, A Garriga-Alonso, N Goldwosky-Dill, R Greenblatt, ...
10
The alignment problem from a deep learning perspective. arXiv
R Ngo, L Chan, S Mindermann
URL: http://arxiv. org/abs/2209.00626, 2023
82023
Progress measures for grokking via mechanistic interpretability, January 2023
N Nanda, L Chan, T Lieberum, J Smith, J Steinhardt
arXiv preprint arXiv:2301.05217, 0
7
Progress measures for grokking via mechanistic interpretability, 2023
N Nanda, L Chan, T Lieberum, J Smith, J Steinhardt
URL https://arxiv. org/abs/2301.05217, 0
5
Neural networks learn representation theory: Reverse engineering how networks perform group operations
B Chughtai, L Chan, N Nanda
ICLR 2023 Workshop on Physics for Machine Learning, 2023
42023
Language models are better than humans at next-token prediction
B Shlegeris, F Roger, L Chan, E McLean
arXiv preprint arXiv:2212.11281, 2022
42022
A study on autonomous hole machining process analysis by reverse engineering of NC programs
X Yan, L Chan, K Yamazaki, J Liu, M Kubota, Y Amano
SAE transactions, 1045-1051, 1999
41999
The Alignment Problem from a Deep Learning Perspective: A Position Paper
R Ngo, L Chan, S Mindermann
The Twelfth International Conference on Learning Representations, 2023
12023
The impacts of known and unknown demonstrator irrationality on reward inference
L Chan, A Critch, A Dragan
12020
Autonomous machining process analyzer
LC Chan
University of California, Davis, 1998
11998
The system can't perform the operation now. Try again later.
Articles 1–20