Stochastic shortest path: Minimax, parameter-free and towards horizon-free regret J Tarbouriech, R Zhou, SS Du, M Pirotta, M Valko, A Lazaric Advances in neural information processing systems 34, 6843-6855, 2021 | 31 | 2021 |
Horizon-Free and Variance-Dependent Reinforcement Learning for Latent Markov Decision Processes R Zhou, R Wang, SS Du International Conference on Machine Learning, 42698-42723, 2023 | 7* | 2023 |
Sharp variance-dependent bounds in reinforcement learning: Best of both worlds in stochastic and deterministic environments R Zhou, Z Zhang, SS Du International Conference on Machine Learning, 42878-42914, 2023 | 7 | 2023 |
Understanding curriculum learning in policy optimization for solving combinatorial optimization problems R Zhou, Y Tian, Y Wu, SS Du arXiv preprint arXiv:2202.05423, 2022 | 4* | 2022 |
Free from bellman completeness: Trajectory stitching via model-based return-conditioned supervised learning Z Zhou, C Zhu, R Zhou, Q Cui, A Gupta, SS Du arXiv preprint arXiv:2310.19308, 2023 | 1 | 2023 |
Reflect-RL: Two-Player Online RL Fine-Tuning for LMs R Zhou, SS Du, B Li arXiv preprint arXiv:2402.12621, 2024 | | 2024 |