フォロー
Daichi Mukunoki
Daichi Mukunoki
RIKEN Center for Computational Science
確認したメール アドレス: riken.jp - ホームページ
タイトル
引用先
引用先
Matrix engines for high performance computing: A paragon of performance or grasping at straws?
J Domke, E Vatai, A Drozd, P ChenT, Y Oyama, L Zhang, S Salaria, ...
2021 IEEE International Parallel and Distributed Processing Symposium (IPDPS …, 2021
332021
Reproducible BLAS routines with tunable accuracy using ozaki scheme for many-core architectures
D Mukunoki, T Ogita, K Ozaki
Parallel Processing and Applied Mathematics: 13th International Conference …, 2020
252020
DGEMM using tensor cores, and its accurate and reproducible versions
D Mukunoki, K Ozaki, T Ogita, T Imamura
International Conference on High Performance Computing, 230-248, 2020
242020
Optimization of sparse matrix-vector multiplication for CRS format on NVIDIA Kepler architecture GPUs
D Mukunoki, D Takahashi
Computational Science and Its Applications–ICCSA 2013: 13th International …, 2013
242013
Implementation and evaluation of triple precision BLAS subroutines on GPUs
D Mukunoki, D Takahashi
2012 IEEE 26th International Parallel and Distributed Processing Symposium …, 2012
202012
Performance and energy consumption of accurate and mixed-precision linear algebra kernels on GPUs
D Mukunoki, T Ogita
Journal of Computational and Applied Mathematics 372, 112701, 2020
162020
Implementation and evaluation of quadruple precision BLAS functions on GPUs
D Mukunoki, D Takahashi
Applied Parallel and Scientific Computing: 10th International Conference …, 2012
162012
Fast implementation of general matrix-vector multiplication (GEMV) on Kepler GPUs
D Mukunoki, T Imamura, D Takahashi
2015 23rd Euromicro International Conference on Parallel, Distributed, and …, 2015
152015
Using quadruple precision arithmetic to accelerate krylov subspace methods on gpus
D Mukunoki, D Takahashi
Parallel Processing and Applied Mathematics: 10th International Conference …, 2014
142014
Reduced-precision floating-point formats on GPUs for high performance and energy efficient computation
D Mukunoki, T Imamura
2016 IEEE International Conference on Cluster Computing (CLUSTER), 144-145, 2016
112016
Automatic thread-block size adjustment for memory-bound BLAS kernels on GPUs
D Mukunoki, T Imamura, D Takahashi
2016 IEEE 10th International Symposium on Embedded Multicore/Many-core …, 2016
82016
Conjugate gradient solvers with high accuracy and bit-wise reproducibility between CPU and GPU using Ozaki scheme
D Mukunoki, K Ozaki, T Ogita, R Iakymchuk
The International Conference on High Performance Computing in Asia-Pacific …, 2021
72021
Accurate matrix multiplication on binary128 format accelerated by ozaki scheme
D Mukunoki, K Ozaki, T Ogita, T Imamura
Proceedings of the 50th International Conference on Parallel Processing, 1-11, 2021
62021
Performance comparison of double, triple and quadruple precision real and complex blas subroutines on gpus
D Mukunoki, D Takahashi
Proceedings of the ATIP/A* CRC Workshop on Accelerator Technologies for High …, 2012
62012
GPU における 3 倍・4 倍精度浮動小数点演算の実現と性能評価
椋木大地, 高橋大介
情報処理学会論文誌コンピューティングシステム (ACS) 6 (1), 66-77, 2013
52013
Minimal-precision computing for high-performance, energy-efficient, and reliable computations
D Mukunoki, I Toshiyuki, Y Tan, A Koshiba, J Huthmann, K Sano, ...
France-Japan-Germany trilateral workshop: Convergence of HPC and Data …, 2019
42019
Implementation and Performance Analysis of 2.5 D-PDGEMM on the K Computer
D Mukunoki, T Imamura
Parallel Processing and Applied Mathematics: 12th International Conference …, 2018
42018
Sparse Matrix-Vector Multiplication with Reduced-Precision Memory Accessor
D Mukunoki, M Kawai, T Imamura
2023 IEEE 16th International Symposium on Embedded Multicore/Many-core …, 2023
32023
Can we avoid rounding-error estimation in HPC codes and still get trustful results?
F Jézéquel, S Graillat, D Mukunoki, T Imamura, R Iakymchuk
32020
Can we avoid rounding-error estimation in HPC codes and still get trustworthy results?
F Jézéquel, S Graillat, D Mukunoki, T Imamura, R Iakymchuk
Software Verification: 12th International Conference, VSTTE 2020, and 13th …, 2020
32020
現在システムで処理を実行できません。しばらくしてからもう一度お試しください。
論文 1–20