Follow
Sehoon Kim
Title
Cited by
Cited by
Year
A survey of quantization methods for efficient neural network inference
A Gholami, S Kim, Z Dong, Z Yao, MW Mahoney, K Keutzer
Low-Power Computer Vision, 291-326, 2022
12522022
I-BERT: Integer-only BERT quantization
S Kim, A Gholami, Z Yao, MW Mahoney, K Keutzer
International conference on machine learning, 5506-5518, 2021
3542021
SqueezeLLM: Dense-and-Sparse Quantization
S Kim, C Hooper, A Gholami, Z Dong, X Li, S Shen, MW Mahoney, ...
ICML 2024, 2023
1402023
Learned Token Pruning for Transformers
S Kim, S Shen, D Thorsley, A Gholami, W Kwon, J Hassoun, K Keutzer
Proceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and …, 2022
1392022
A Fast Post-Training Pruning Framework for Transformers
W Kwon, S Kim, MW Mahoney, J Hassoun, K Keutzer, A Gholami
Advances in Neural Information Processing Systems 35, 2022
1222022
Squeezeformer: An efficient transformer for automatic speech recognition
S Kim, A Gholami, A Shaw, N Lee, K Mangalam, J Malik, MW Mahoney, ...
Advances in Neural Information Processing Systems 35, 2022
1192022
KVQuant: Towards 10 Million Context Length LLM Inference with KV Cache Quantization
C Hooper, S Kim, H Mohammadzadeh, MW Mahoney, YS Shao, ...
NeurIPS 2024, 2024
84*2024
Full Stack Optimization of Transformer Inference: a Survey
S Kim, C Hooper, T Wattanawong, M Kang, R Yan, H Genc, G Dinh, ...
arXiv preprint arXiv:2302.14017, 2023
822023
Speculative decoding with big little decoder
S Kim, K Mangalam, S Moon, J Malik, MW Mahoney, A Gholami, ...
Advances in Neural Information Processing Systems 36, 2024
77*2024
Hessian-aware pruning and optimal neural implant
S Yu, Z Yao, A Gholami, Z Dong, S Kim, MW Mahoney, K Keutzer
Proceedings of the IEEE/CVF Winter Conference on Applications of Computer …, 2022
652022
Applications and techniques for fast machine learning in science
AMC Deiana, N Tran, J Agar, M Blott, G Di Guglielmo, J Duarte, P Harris, ...
Frontiers in big Data 5, 787421, 2022
642022
AI and memory wall
A Gholami, Z Yao, S Kim, C Hooper, MW Mahoney, K Keutzer
IEEE Micro, 2024
572024
An LLM Compiler for Parallel Function Calling
S Kim, S Moon, R Tabrizi, N Lee, MW Mahoney, K Keutzer, A Gholami
ICML 2024, 2023
322023
LLM2LLM: Boosting LLMs with Novel Iterative Data Enhancement
N Lee, T Wattanawong, S Kim, K Mangalam, S Shen, G Anumanchipali, ...
ACL 2024, 2024
252024
Integer-Only Zero-Shot Quantization for Efficient Speech Recognition
S Kim, A Gholami, Z Yao, N Lee, P Wang, A Nrusimha, B Zhai, T Gao, ...
ICASSP 2022-2022 IEEE International Conference on Acoustics, Speech and …, 2022
232022
SPEED: Speculative Pipelined Execution for Efficient Decoding
C Hooper, S Kim, H Mohammadzadeh, H Genc, K Keutzer, A Gholami, ...
arXiv preprint arXiv:2310.12072, 2023
182023
WindTunnel: towards differentiable ML pipelines beyond a single model
GI Yu, S Amizadeh, S Kim, A Pagnoni, C Zhang, BG Chun, M Weimer, ...
Proceedings of the VLDB Endowment 15 (1), 11-20, 2021
16*2021
Memory-Efficient Hardware Performance Counters with Approximate-Counting Algorithms
J Xu, S Kim, B Nikolic, YS Shao
2021 IEEE International Symposium on Performance Analysis of Systems and …, 2021
62021
TinyAgent: Function Calling at the Edge
LE Erdogan, N Lee, S Jha, S Kim, R Tabrizi, S Moon, C Hooper, ...
EMNLP 2024 (Demo), 2024
52024
Terra: Imperative-Symbolic Co-Execution of Imperative Deep Learning Programs
T Kim, E Jeong, GW Kim, Y Koo, S Kim, G Yu, BG Chun
Advances in Neural Information Processing Systems 34, 1468-1480, 2021
52021
The system can't perform the operation now. Try again later.
Articles 1–20