Webshop: Towards scalable real-world web interaction with grounded language agents S Yao, H Chen, J Yang, K Narasimhan NeurIPS 2022, 2022 | 136 | 2022 |
SWE-bench: Can Language Models Resolve Real-World GitHub Issues? CE Jimenez, J Yang, A Wettig, S Yao, K Pei, O Press, K Narasimhan ICLR 2024, 2023 | 37 | 2023 |
InterCode: Standardizing and Benchmarking Interactive Coding with Execution Feedback J Yang, A Prabhakar, K Narasimhan, S Yao NeurIPS 2023 (Datasets & Benchmarks), 2023 | 31 | 2023 |
Language Agents as Hackers: Evaluating Cybersecurity Skills with Capture the Flag J Yang, A Prabhakar, S Yao, K Pei, KR Narasimhan Multi-Agent Security Workshop @ NeurIPS 2023, 2023 | 3 | 2023 |
Quartz: A framework for engineering secure smart contracts J Kolb, J Yang, RH Katz, DE Culler EECS Department, University of California, Berkeley, Tech. Rep. UCB/EECS …, 2020 | 3 | 2020 |
DevBench: A Comprehensive Benchmark for Software Development B Li, W Wu, Z Tang, L Shi, J Yang, J Li, S Yao, C Qian, B Hui, Q Zhang, ... arXiv preprint arXiv:2403.08604, 2024 | | 2024 |
Referral Augmentation for Zero-Shot Information Retrieval M Tang, S Yao, J Yang, K Narasimhan arXiv preprint arXiv:2305.15098, 2023 | | 2023 |
Learning Language through Interactions with the Digital World JB Yang Princeton University, 2023 | | 2023 |
Towards an Enhanced, Faithful, and Adaptable Web Interaction Environment J Yang, H Chen, KR Narasimhan Second Workshop on Language and Reinforcement Learning @ NeurIPS 2022, 2022 | | 2022 |