Nitish Shirish Keskar
Nitish Shirish Keskar
Salesforce Research
確認したメール アドレス: salesforce.com - ホームページ
タイトル引用先
On large-batch training for deep learning: Generalization gap and sharp minima
NS Keskar, D Mudigere, J Nocedal, M Smelyanskiy, PTP Tang
arXiv preprint arXiv:1609.04836, 2016
4432016
Regularizing and optimizing LSTM language models
S Merity, NS Keskar, R Socher
arXiv preprint arXiv:1708.02182, 2017
1822017
Improving generalization performance by switching from adam to sgd
NS Keskar, R Socher
arXiv preprint arXiv:1712.07628, 2017
412017
The natural language decathlon: Multitask learning as question answering
B McCann, NS Keskar, C Xiong, R Socher
arXiv preprint arXiv:1806.08730, 2018
302018
An analysis of neural language modeling at multiple scales
S Merity, NS Keskar, R Socher
arXiv preprint arXiv:1803.08240, 2018
282018
Balancing communication and computation in distributed optimization
A Berahas, R Bollapragada, NS Keskar, E Wei
IEEE Transactions on Automatic Control, 2018
162018
Weighted transformer network for machine translation
K Ahmed, NS Keskar, R Socher
arXiv preprint arXiv:1711.02132, 2017
162017
adaqn: An adaptive quasi-newton algorithm for training rnns
NS Keskar, AS Berahas
Joint European Conference on Machine Learning and Knowledge Discovery in …, 2016
152016
A second-order method for convex 1-regularized optimization with active-set prediction
N Keskar, J Nocedal, F Öztoprak, A Waechter
Optimization Methods and Software 31 (3), 605-621, 2016
122016
A nonmonotone learning rate strategy for SGD training of deep neural networks
NS Keskar, G Saon
2015 IEEE International Conference on Acoustics, Speech and Signal …, 2015
82015
A limited-memory quasi-Newton algorithm for bound-constrained non-smooth optimization
N Keskar, A Wächter
Optimization Methods and Software 34 (1), 150-171, 2019
72019
Identifying Generalization Properties in Neural Networks
H Wang, NS Keskar, C Xiong, R Socher
arXiv preprint arXiv:1809.07402, 2018
62018
Using mode connectivity for loss landscape analysis
A Gotmare, NS Keskar, C Xiong, R Socher
arXiv preprint arXiv:1806.06977, 2018
22018
A Closer Look at Deep Learning Heuristics: Learning rate restarts, Warmup and Distillation
A Gotmare, NS Keskar, C Xiong, R Socher
arXiv preprint arXiv:1810.13243, 2018
12018
Sequence-to-sequence prediction using a neural network model
NS Keskar, K Ahmed, R Socher
US Patent App. 15/884,125, 2019
2019
Unifying Question Answering and Text Classification via Span Extraction
NS Keskar, B McCann, C Xiong, R Socher
arXiv preprint arXiv:1904.09286, 2019
2019
Coarse-grain Fine-grain Coattention Network for Multi-evidence Question Answering
V Zhong, C Xiong, NS Keskar, R Socher
arXiv preprint arXiv:1901.00603, 2019
2019
Scalable Language Modeling: WikiText-103 on a Single GPU in 12 hours
S Merity, NS Keskar, J Bradbury, R Socher
2018
Second-Order Methods for Stochastic and Nonsmooth Optimization
NS Keskar
Northwestern University, 2017
2017
現在システムで処理を実行できません。しばらくしてからもう一度お試しください。
論文 1–19