フォロー
Tengyang Xie
Tengyang Xie
確認したメール アドレス: cs.wisc.edu - ホームページ
タイトル
引用先
引用先
Bellman-consistent pessimism for offline reinforcement learning
T Xie, CA Cheng, N Jiang, P Mineiro, A Agarwal
Advances in neural information processing systems 34, 6683-6694, 2021
2582021
Towards Optimal Off-Policy Evaluation for Reinforcement Learning with Marginalized Importance Sampling
T Xie, Y Ma, YX Wang
Advances in Neural Information Processing Systems, 9665-9675, 2019
1752019
Policy finetuning: Bridging sample-efficient offline and online reinforcement learning
T Xie, N Jiang, H Wang, C Xiong, Y Bai
Advances in neural information processing systems 34, 27395-27407, 2021
1592021
Batch value-function approximation with only realizability
T Xie, N Jiang
International Conference on Machine Learning, 11404-11413, 2021
1162021
Adversarially trained actor critic for offline reinforcement learning
CA Cheng, T Xie, N Jiang, A Agarwal
International Conference on Machine Learning, 3852-3878, 2022
1122022
Provably efficient q-learning with low switching cost
Y Bai, T Xie, N Jiang, YX Wang
Advances in Neural Information Processing Systems, 8004-8013, 2019
1012019
Q* Approximation Schemes for Batch Reinforcement Learning: A Theoretical Comparison
T Xie, N Jiang
Proceedings of the 36th Conference on Uncertainty in Artificial Intelligence …, 2020
992020
Finite sample analysis of minimax offline reinforcement learning: Completeness, fast rates and first-order efficiency
M Uehara, M Imaizumi, N Jiang, N Kallus, W Sun, T Xie
arXiv preprint arXiv:2102.02981, 2021
622021
The role of coverage in online reinforcement learning
T Xie, DJ Foster, Y Bai, N Jiang, SM Kakade
arXiv preprint arXiv:2210.04157, 2022
522022
A Block Coordinate Ascent Algorithm for Mean-Variance Optimization
T Xie, B Liu, Y Xu, M Ghavamzadeh, Y Chow, D Lyu, D Yoon
Advances in Neural Information Processing Systems, 1073-1083, 2018
362018
A variant of the wang-foster-kakade lower bound for the discounted setting
P Amortila, N Jiang, T Xie
arXiv preprint arXiv:2011.01075, 2020
232020
Direct nash optimization: Teaching language models to self-improve with general preferences
C Rosset, CA Cheng, A Mitra, M Santacroce, A Awadallah, T Xie
arXiv preprint arXiv:2404.03715, 2024
192024
Adversarial model for offline reinforcement learning
M Bhardwaj, T Xie, B Boots, N Jiang, CA Cheng
Advances in Neural Information Processing Systems 36, 2024
192024
Preference fine-tuning of llms should leverage suboptimal, on-policy data
F Tajwar, A Singh, A Sharma, R Rafailov, J Schneider, T Xie, S Ermon, ...
arXiv preprint arXiv:2404.14367, 2024
102024
Interaction-Grounded Learning
T Xie, J Langford, P Mineiro, I Momennejad
International Conference on Machine Learning, 11414-11423, 2021
102021
Armor: A model-based framework for improving arbitrary baseline policies with offline data
T Xie, M Bhardwaj, N Jiang, CA Cheng
arXiv preprint arXiv:2211.04538, 2022
82022
Interaction-grounded learning with action-inclusive feedback
T Xie, A Saran, DJ Foster, L Molu, I Momennejad, N Jiang, P Mineiro, ...
Advances in Neural Information Processing Systems 35, 12529-12541, 2022
52022
Harnessing density ratios for online reinforcement learning
P Amortila, DJ Foster, N Jiang, A Sekhari, T Xie
arXiv preprint arXiv:2401.09681, 2024
42024
Privacy preserving off-policy evaluation
T Xie, PS Thomas, G Miklau
arXiv preprint arXiv:1902.00174, 2019
42019
Marginalized Off-Policy Evaluation for Reinforcement Learning
T Xie, YX Wang, Y Ma
NeurIPS 2018 Workshop on Causal Learning, 2018
32018
現在システムで処理を実行できません。しばらくしてからもう一度お試しください。
論文 1–20