Mikhail Smelyanskiy
Debunking the 100X GPU vs. CPU myth: an evaluation of throughput computing on CPU and GPU
VW Lee, C Kim, J Chhugani, M Deisher, D Kim, AD Nguyen, N Satish, ...
ACM SIGARCH computer architecture news 38 (3), 451-460, 2010
On large-batch training for deep learning: Generalization gap and sharp minima
NS Keskar, D Mudigere, J Nocedal, M Smelyanskiy, PTP Tang
arXiv preprint arXiv:1609.04836, 2016
Proving program termination
B Cook, A Podelski, A Rybalchenko
Communications of the ACM 54 (5), 88-98, 2011
Efficient sparse matrix-vector multiplication on x86-based many-core processors
X Liu, M Smelyanskiy, E Chow, P Dubey
Proceedings of the 27th international ACM conference on International …, 2013
Design and implementation of the linpack benchmark for single and multi-node systems based on intel® xeon phi coprocessor
A Heinecke, K Vaidyanathan, M Smelyanskiy, A Kobotov, R Dubtsov, ...
2013 IEEE 27th International Symposium on Parallel and Distributed …, 2013
Exploring simd for molecular dynamics, using intel® xeon® processors and intel® xeon phi coprocessors
SJ Pennycook, CJ Hughes, M Smelyanskiy, SA Jarvis
2013 IEEE 27th International Symposium on Parallel and Distributed …, 2013
Can traditional programming bridge the ninja performance gap for parallel computing applications?
N Satish, C Kim, J Chhugani, H Saito, R Krishnaiyer, M Smelyanskiy, ...
2012 39th Annual International Symposium on Computer Architecture (ISCA …, 2012
Convergence of recognition, mining, and synthesis workloads and its implications
YK Chen, J Chhugani, P Dubey, CJ Hughes, D Kim, S Kumar, VW Lee, ...
Proceedings of the IEEE 96 (5), 790-807, 2008
Mapping high-fidelity volume rendering for medical imaging to CPU, GPU and many-core architectures
M Smelyanskiy, D Holmes, J Chhugani, A Larson, DM Carmean, ...
IEEE transactions on visualization and computer graphics 15 (6), 1563-1570, 2009
Anatomy of high-performance many-threaded matrix multiplication
TM Smith, R Van De Geijn, M Smelyanskiy, JR Hammond, FG Van Zee
2014 IEEE 28th International Parallel and Distributed Processing Symposium …, 2014
Optimization of geometric multigrid for emerging multi-and manycore processors
S Williams, DD Kalamkar, A Singh, AM Deshpande, B Van Straalen, ...
Proceedings of the International Conference on High Performance Computing …, 2012
Stack value file: Custom microarchitecture for the stack
HHS Lee, M Smelyanskiy, CJ Newburn, GS Tyson
Proceedings HPCA Seventh International Symposium on High-Performance …, 2001
Petascale high order dynamic rupture earthquake simulations on heterogeneous supercomputers
A Heinecke, A Breuer, S Rettenberger, M Bader, AA Gabriel, C Pelties, ...
SC'14: Proceedings of the International Conference for High Performance …, 2014
The BLIS framework: Experiments in portability
FGV Zee, TM Smith, B Marker, TM Low, RA Geijn, FD Igual, ...
ACM Transactions on Mathematical Software (TOMS) 42 (2), 12, 2016
Applied machine learning at facebook: A datacenter infrastructure perspective
K Hazelwood, S Bird, D Brooks, S Chintala, U Diril, D Dzhulgakov, ...
2018 IEEE International Symposium on High Performance Computer Architecture …, 2018
Scheduling and partitioning tasks via architecture-aware feedback information
A Ozgur, G Buehrer, A Nguyen, D Kim, V Lee, M Smelyanskiy, YK Chen
US Patent App. 11/300,809, 2007
Atomic vector operations on chip multiprocessors
S Kumar, D Kim, M Smelyanskiy, YK Chen, J Chhugani, CJ Hughes, ...
ACM SIGARCH Computer Architecture News 36 (3), 441-452, 2008
Lattice QCD on Intel® Xeon PhiTM Coprocessors
B Joo, DD Kalamkar, K Vaidyanathan, M Smelyanskiy, K Pamnany, ...
International Supercomputing Conference, 40-54, 2013
An algorithm for the fast solution of symmetric linear complementarity problems
JL Morales, J Nocedal, M Smelyanskiy
Numerische Mathematik 111 (2), 251-266, 2008
Efficient shared-memory implementation of high-performance conjugate gradient benchmark and its application to unstructured matrices
J Park, M Smelyanskiy, K Vaidyanathan, A Heinecke, DD Kalamkar, X Liu, ...
Proceedings of the International Conference for High Performance Computing …, 2014
論文 1–20