‪Javier Rando‬ - ‪Google Scholar‬

Get my own profile

Cited by

	All	Since 2019
Citations	338	338
h-index	7	7
i10-index	6	6

0

200

100

50

150

202020212022202320243 3 6 131 194

Co-authors

Florian TramèrAssistant Professor of Computer Science, ETH ZurichVerified email at inf.ethz.ch
Daniel PalekaETH ZurichVerified email at inf.ethz.ch
Stephen CasperPhD student, MITVerified email at mit.edu
Nitish JoshiNew York UniversityVerified email at nyu.edu
He HeNew York UniversityVerified email at cs.nyu.edu
Fernando Perez-CruzChief Data Scientist Swiss Data Science Center and Professor T. at ETHZ--CSVerified email at sdsc.ethz.ch

Javier Rando

Javier Rando

Other namesJavier Rando Ramirez

Verified email at ai.ethz.ch - Homepage

Artificial Intelligence Language Models Safety Security Privacy


Title Sort by citations Sort by year Sort by title	Cited by Cited by	Year
Open problems and fundamental limitations of reinforcement learning from human feedback S Casper, X Davies, C Shi, TK Gilbert, J Scheurer, J Rando, R Freedman, ... arXiv preprint arXiv:2307.15217, 2023	171	2023
Red-Teaming the Stable Diffusion Safety Filter J Rando, D Paleka, D Lindner, L Heim, F Tramèr ML Safety Workshop - NeurIPS 2022, 2022	75	2022
Scalable and transferable black-box jailbreaks for language models via persona modulation R Shah, S Pour, A Tagade, S Casper, J Rando arXiv preprint arXiv:2311.03348, 2023	27	2023
"That Is a Suspicious Reaction!": Interpreting Logits Variation to Detect NLP Adversarial Attacks E Mosca, S Agarwal, J Rando-Ramirez, G Groh ACL 2022, 2022	20	2022
Universal jailbreak backdoors from poisoned human feedback J Rando, F Tramèr arXiv preprint arXiv:2311.14455, 2023	16	2023
Uneven coverage of natural disasters in Wikipedia: The case of flood V Lorini, J Rando, D Saez-Trumper, C Castillo ISCRAM 2020, 2020	11	2020
Personas as a Way to Model Truthfulness in Language Models N Joshi, J Rando, A Saparov, N Kim, H He arXiv preprint arXiv:2310.18168, 2023	7	2023
PassGPT: Password Modeling and (Guided) Generation with Large Language Models J Rando, F Perez-Cruz, B Hitaj European Symposium on Research in Computer Security, 164-183, 2023	5	2023
Foundational challenges in assuring alignment and safety of large language models U Anwar, A Saparov, J Rando, D Paleka, M Turpin, P Hase, ES Lubana, ... arXiv preprint arXiv:2404.09932, 2024	4	2024
Attributions toward artificial agents in a modified Moral Turing Test E Aharoni, S Fernandes, DJ Brady, C Alexander, M Criner, K Queen, ... Scientific Reports 14 (1), 8458, 2024	1	2024
Exploring Adversarial Attacks and Defenses in Vision Transformers trained with DINO J Rando, N Naimi, T Baumann, M Mathys AdvML Frontiers Workshop (ICML 2022), 2022	1	2022
Competition Report: Finding Universal Jailbreak Backdoors in Aligned LLMs J Rando, F Croce, K Mitka, S Shabalin, M Andriushchenko, N Flammarion, ... arXiv preprint arXiv:2404.14461, 2024		2024

The system can't perform the operation now. Try again later.

Articles 1–12