Follow
Dhiraj Kalamkar
Dhiraj Kalamkar
Other namesDhiraj D Kalamkar
Verified email at intel.com
Title
Cited by
Cited by
Year
A study of BFLOAT16 for deep learning training
D Kalamkar, D Mudigere, N Mellempudi, D Das, K Banerjee, S Avancha, ...
arXiv preprint arXiv:1905.12322, 2019
3652019
Distributed deep learning using synchronous stochastic gradient descent
D Das, S Avancha, D Mudigere, K Vaidynathan, S Sridharan, D Kalamkar, ...
arXiv preprint arXiv:1602.06709, 2016
2132016
Mixed precision training of convolutional neural networks using integer operations
D Das, N Mellempudi, D Mudigere, D Kalamkar, S Avancha, K Banerjee, ...
arXiv preprint arXiv:1802.00930, 2018
2062018
Anatomy of high-performance deep learning convolutions on simd architectures
E Georganas, S Avancha, K Banerjee, D Kalamkar, G Henry, H Pabst, ...
SC18: International Conference for High Performance Computing, Networking …, 2018
1392018
Performing power management in a multicore processor
VW Lee, ET Grochowski, D Kim, Y Bai, S Li, NK Mellempudi, ...
US Patent 10,234,930, 2019
1282019
Distgnn: Scalable distributed training for large-scale graph neural networks
V Md, S Misra, G Ma, R Mohanty, E Georganas, A Heinecke, D Kalamkar, ...
Proceedings of the International Conference for High Performance Computing …, 2021
1272021
Optimization of geometric multigrid for emerging multi-and manycore processors
S Williams, DD Kalamkar, A Singh, AM Deshpande, B Van Straalen, ...
SC'12: Proceedings of the International Conference on High Performance …, 2012
952012
Lattice QCD on Intel® Xeon PhiTM Coprocessors
B Joo, DD Kalamkar, K Vaidyanathan, M Smelyanskiy, K Pamnany, ...
Supercomputing: 28th International Supercomputing Conference, ISC 2013 …, 2013
882013
Abstraction layers for scalable distributed machine learning
DD Kalamkar, K Vaidyanathan, S Sridharan, D Das
US Patent 11,094,029, 2021
702021
Efficient shared-memory implementation of high-performance conjugate gradient benchmark and its application to unstructured matrices
J Park, M Smelyanskiy, K Vaidyanathan, A Heinecke, DD Kalamkar, X Liu, ...
SC'14: Proceedings of the International Conference for High Performance …, 2014
692014
Enabling efficient multithreaded MPI communication through a library-based implementation of MPI endpoints
S Sridharan, J Dinan, DD Kalamkar
SC'14: Proceedings of the International Conference for High Performance …, 2014
572014
Optimizing deep learning recommender systems training on cpu cluster architectures
D Kalamkar, E Georganas, S Srinivasan, J Chen, M Shiryaev, A Heinecke
SC20: International Conference for High Performance Computing, Networking …, 2020
552020
Improving concurrency and asynchrony in multithreaded MPI applications using software offloading
K Vaidyanathan, DD Kalamkar, K Pamnany, JR Hammond, P Balaji, ...
Proceedings of the International Conference for High Performance Computing …, 2015
542015
Lattice qcd with domain decomposition on intel® xeon phi co-processors
S Heybrock, B Joó, DD Kalamkar, M Smelyanskiy, K Vaidyanathan, ...
SC'14: Proceedings of the International Conference for High Performance …, 2014
502014
Optimizing Wilson-Dirac Operator and Linear Solvers for Intel® KNL
B Joó, DD Kalamkar, T Kurth, K Vaidyanathan, A Walden
High Performance Computing: ISC High Performance 2016 International …, 2016
382016
On scale-out deep learning training for cloud and hpc
S Sridharan, K Vaidyanathan, D Kalamkar, D Das, ME Smorkalov, ...
arXiv preprint arXiv:1801.08030, 2018
352018
Harnessing deep learning via a single building block
E Georganas, K Banerjee, D Kalamkar, S Avancha, A Venkat, ...
2020 IEEE International Parallel and Distributed Processing Symposium (IPDPS …, 2020
252020
Performing power management in a multicore processor
VW Lee, D Kim, Y Bai, S Ji, S Li, DD Kalamkar, NK Mellempudi
US Patent 9,910,481, 2018
232018
Tensor processing primitives: A programming abstraction for efficiency and portability in deep learning workloads
E Georganas, D Kalamkar, S Avancha, M Adelman, C Anderson, A Breuer, ...
Proceedings of the International Conference for High Performance Computing …, 2021
222021
Wilson Dslash kernel from lattice QCD optimization
B Joó, M Smelyanskiy, DD Kalamkar, K Vaidyanathan
Thomas Jefferson National Accelerator Facility (TJNAF), Newport News, VA …, 2015
202015
The system can't perform the operation now. Try again later.
Articles 1–20