フォロー
Lauro Langosco
Lauro Langosco
確認したメール アドレス: cam.ac.uk
タイトル
引用先
引用先
Open Problems and Fundamental Limitations of Reinforcement Learning from Human Feedback
S Casper, X Davies, C Shi, TK Gilbert, J Scheurer, J Rando, R Freedman, ...
arXiv preprint arXiv:2307.15217, 2023
2472023
Goal Misgeneralization in Deep Reinforcement Learning
L Langosco, J Koch, L Sharkey, J Pfau, L Orseau, D Krueger
ICML 2022, 9, 2022
892022
Harms from Increasingly Agentic Algorithmic Systems
A Chan, R Salganik, A Markelius, C Pang, N Rajkumar, D Krasheninnikov, ...
Proceedings of the 2023 ACM Conference on Fairness, Accountability, and …, 2023
60*2023
Foundational Challenges in Assuring Alignment and Safety of Large Language Models
U Anwar, A Saparov, J Rando, D Paleka, M Turpin, P Hase, ES Lubana, ...
arXiv preprint arXiv:2404.09932, 2024
302024
Unifying Grokking and Double Descent
X Davies, L Langosco, D Krueger
ML Safety Workshop Neurips 2022, 2023
202023
Neural Variational Gradient Descent
L Langosco di Langosco, V Fortuin, H Strathmann
ICML Workshop on Uncertainty & Robustness in Deep Learning, 2021
17*2021
Detecting Backdoors with Meta-Models
L Langosco, N Alex, W Baker, D Quarel, H Bradley, D Krueger
NeurIPS 2023 Workshop on Backdoors in Deep Learning-The Good, the Bad, and …, 2023
22023
Training Equilibria in Reinforcement Learning
L Langosco, D Krueger, A Gleave
Deep Reinforcement Learning Workshop NeurIPS 2022, 2022
2022
現在システムで処理を実行できません。しばらくしてからもう一度お試しください。
論文 1–8