Noam Shazeer

Cited by

	All	Since 2019
Citations	160779	155931
h-index	58	53
i10-index	94	78

57000

28500

14250

42750

20172018201920202021202220232024643 2301 6818 13291 23049 36093 56534 20066

Public access

View all

1 article

0 articles

available

not available

Based on funding mandates

Noam Shazeer

Character.ai

Verified email at character.ai

Deep Learning


Title Sort by citations Sort by year Sort by title	Cited by Cited by	Year
Attention is all you need A Vaswani, N Shazeer, N Parmar, J Uszkoreit, L Jones, AN Gomez, ... Advances in neural information processing systems 30, 2017	119216	2017
Exploring the limits of transfer learning with a unified text-to-text transformer C Raffel, N Shazeer, A Roberts, K Lee, S Narang, M Matena, Y Zhou, W Li, ... Journal of machine learning research 21 (140), 1-67, 2020	14765	2020
Palm: Scaling language modeling with pathways A Chowdhery, S Narang, J Devlin, M Bosma, G Mishra, A Roberts, ... Journal of Machine Learning Research 24 (240), 1-113, 2023	3397	2023
Scheduled sampling for sequence prediction with recurrent neural networks S Bengio, O Vinyals, N Jaitly, N Shazeer Advances in neural information processing systems 28, 2015	2214	2015
Image transformer N Parmar, A Vaswani, J Uszkoreit, L Kaiser, N Shazeer, A Ku, D Tran International conference on machine learning, 4055-4064, 2018	1808	2018
Outrageously large neural networks: The sparsely-gated mixture-of-experts layer N Shazeer, A Mirhoseini, K Maziarz, A Davis, Q Le, G Hinton, J Dean arXiv preprint arXiv:1701.06538, 2017	1791	2017
Exploring the limits of language modeling R Jozefowicz, O Vinyals, M Schuster, N Shazeer, Y Wu arXiv preprint arXiv:1602.02410, 2016	1355	2016
Switch transformers: Scaling to trillion parameter models with simple and efficient sparsity W Fedus, B Zoph, N Shazeer Journal of Machine Learning Research 23 (120), 1-39, 2022	1289	2022
Lamda: Language models for dialog applications R Thoppilan, D De Freitas, J Hall, N Shazeer, A Kulshreshtha, HT Cheng, ... arXiv preprint arXiv:2201.08239, 2022	1092	2022
Attention is all you need. arXiv 2017 A Vaswani, N Shazeer, N Parmar, J Uszkoreit, L Jones, AN Gomez, ... arXiv preprint arXiv:1706.03762 3762, 2023	1033	2023
Generating wikipedia by summarizing long sequences PJ Liu, M Saleh, E Pot, B Goodrich, R Sepassi, L Kaiser, N Shazeer arXiv preprint arXiv:1801.10198, 2018	909	2018
Adafactor: Adaptive learning rates with sublinear memory cost N Shazeer, M Stern International Conference on Machine Learning, 4596-4604, 2018	776	2018
End-to-end text-dependent speaker verification G Heigold, I Moreno, S Bengio, N Shazeer 2016 IEEE International Conference on Acoustics, Speech and Signal …, 2016	752	2016
Gomez Aidan N., Kaiser Łukasz, and Polosukhin Illia. 2017 V Ashish, S Noam, P Niki, U Jakob, J Llion Attention is all you need. In Advances in neural information processing …, 2017	728	2017
How much knowledge can you pack into the parameters of a language model? A Roberts, C Raffel, N Shazeer arXiv preprint arXiv:2002.08910, 2020	696	2020
Gshard: Scaling giant models with conditional computation and automatic sharding D Lepikhin, HJ Lee, Y Xu, D Chen, O Firat, Y Huang, M Krikun, N Shazeer, ... arXiv preprint arXiv:2006.16668, 2020	686	2020
Tensor2tensor for neural machine translation A Vaswani, S Bengio, E Brevdo, F Chollet, AN Gomez, S Gouws, L Jones, ... arXiv preprint arXiv:1803.07416, 2018	611	2018
Attention is all you need (2017) A Vaswani, N Shazeer, N Parmar, J Uszkoreit, L Jones, AN Gomez, ... arXiv preprint arXiv:1706.03762, 2019	505	2019
Serving content-relevant advertisements with client-side device support D Anderson, P Buchheit, JA Dean, GR Harik, CL Gonsalves, N Shazeer, ... US Patent 8,086,559, 2011	402	2011
One model to learn them all L Kaiser, AN Gomez, N Shazeer, A Vaswani, N Parmar, L Jones, ... arXiv preprint arXiv:1706.05137, 2017	381	2017

The system can't perform the operation now. Try again later.

Articles 1–20

Citations per year

Duplicate citations

Merged citations

Add co-authorsCo-authors

Follow

Cited by