Jailbreaking black box large language models in twenty queries P Chao, A Robey, E Dobriban, H Hassani, GJ Pappas, E Wong arXiv preprint arXiv:2310.08419, 2023 | 195 | 2023 |
Adversarial prompting for black box foundation models N Maus, P Chao, E Wong, J Gardner arXiv preprint arXiv:2302.04237 1 (2), 2023 | 75* | 2023 |
Jailbreakbench: An open robustness benchmark for jailbreaking large language models P Chao, E Debenedetti, A Robey, M Andriushchenko, F Croce, V Sehwag, ... arXiv preprint arXiv:2404.01318, 2024 | 25 | 2024 |
Interventional and counterfactual inference with diffusion models P Chao, P Blöbaum, SP Kasiviswanathan arXiv preprint arXiv:2302.00860, 2023 | 12 | 2023 |
A safe harbor for ai evaluation and red teaming S Longpre, S Kapoor, K Klyman, A Ramaswami, R Bommasani, ... arXiv preprint arXiv:2403.04893, 2024 | 9 | 2024 |
Different definitions of conic sections in hyperbolic geometry P Chao, J Rosenberg Involve, a Journal of Mathematics 11 (5), 753-768, 2018 | 7 | 2018 |
AdaPT-GMM: Powerful and robust covariate-assisted multiple testing P Chao, W Fithian arXiv preprint arXiv:2106.15812, 2021 | 6 | 2021 |
Jailbreaking black box large language models in twenty queries. arXiv 2023 P Chao, A Robey, E Dobriban, H Hassani, GJ Pappas, E Wong arXiv preprint arXiv:2310.08419, 2023 | 5 | 2023 |
Generative models for pose transfer P Chao, A Li, G Swamy arXiv preprint arXiv:1806.09070, 2018 | 3 | 2018 |
Statistical Estimation Under Distribution Shift: Wasserstein Perturbations and Minimax Theory P Chao, E Dobriban arXiv preprint arXiv:2308.01853, 2023 | 1 | 2023 |
Watermarking Language Models with Error Correcting Codes P Chao, E Dobriban, H Hassani arXiv preprint arXiv:2406.10281, 2024 | | 2024 |
Adversarial Robustness for Estimation and Alignment P Chao University of Pennsylvania, 2024 | | 2024 |
Position: A Safe Harbor for AI Evaluation and Red Teaming S Longpre, S Kapoor, K Klyman, A Ramaswami, R Bommasani, ... Forty-first International Conference on Machine Learning, 2023 | | 2023 |