Follow
Puyuan  Peng
Puyuan Peng
Research Intern, Meta; PhD student, The University of Texas at Austin
Verified email at utexas.edu - Homepage
Title
Cited by
Cited by
Year
MAE-AST: Masked Autoencoding Audio Spectrogram Transformer
A Baade, P Peng, D Harwath
Interspeech 2022, 2022
1022022
Word discovery in visually grounded, self-supervised speech models
P Peng, D Harwath
Interspeech 2022, 2022
462022
Prompting the Hidden Talent of Web-Scale Speech Models for Zero-Shot Task Generalization
P Peng, B Yan, S Watanabe, D Harwath
Interspeech 2023, 2023
402023
Voicecraft: Zero-shot speech editing and text-to-speech in the wild
P Peng, PY Huang, SW Li, A Mohamed, D Harwath
arXiv preprint arXiv:2403.16973, 2024
382024
Fast-slow transformer for visually grounding speech
P Peng, D Harwath
ICASSP 2022, 2022
362022
Self-supervised representation learning for speech using visual grounding and masked language modeling
P Peng, D Harwath
AAAI 2022 SAS Workshop, 2022
302022
A correspondence variational autoencoder for unsupervised acoustic word embeddings
P Peng, H Kamper, K Livescu
NeurIPS 2020 SAS Workshop, 2020
182020
Syllable Discovery and Cross-Lingual Generalization in a Visually Grounded, Self-Supervised Speech Model
P Peng, SW Li, O Räsänen, A Mohamed, D Harwath
Interspeech 2023, 2023
11*2023
BAT: Learning to Reason about Spatial Sounds with Large Language Models
Z Zheng, P Peng, Z Ma, X Chen, E Choi, D Harwath
arXiv preprint arXiv:2402.01591, 2024
92024
AV-SUPERB: A Multi-Task Evaluation Benchmark for Audio-Visual Representation Models
Y Tseng, L Berry*, YT Chen*, I Chiu*, HH Lin*, M Liu*, P Peng*, YJ Shih*, ...
preprint, 2023
82023
Zero-shot Video Moment Retrieval With Off-the-Shelf Models
A Diwan*, P Peng*, RJ Mooney (* denotes equal contribution)
NeurIPS 2022 TL4NLP, 2022
52022
Audio-Visual Neural Syntax Acquisition
CIJ Lai*, F Shi*, P Peng*, Y Kim, K Gimpel, S Chang, YS Chuang, S Bhati, ...
ASRU 2023, 2023
42023
SpeechCLIP+: Self-supervised multi-task representation learning for speech via CLIP and speech-image data
HF Wang, YJ Shih, HJ Chang, L Berry, P Peng, H Lee, HM Wang, ...
arXiv preprint arXiv:2402.06959, 2024
22024
Style-transfer based Speech and Audio-visual Scene Understanding for Robot Action Sequence Acquisition from Videos
C Hori, P Peng, D Harwath, X Liu, K Ota, S Jain, R Corcodel, D Jha, ...
Interspeech 2023, 2023
22023
Dynamic-superb phase-2: A collaboratively expanding benchmark for measuring the capabilities of spoken language models with 180 tasks
C Huang, WC Chen, S Yang, AT Liu, CA Li, YX Lin, WC Tseng, A Diwan, ...
arXiv preprint arXiv:2411.05361, 2024
12024
Syllablelm: Learning coarse semantic units for speech language models
A Baade, P Peng, D Harwath
arXiv preprint arXiv:2410.04029, 2024
12024
Action2Sound: Ambient-Aware Generation of Action Sounds from Egocentric Videos
C Chen*, P Peng*, A Baid, Z Xue, WN Hsu, D Harwarth, K Grauman
arXiv preprint arXiv:2406.09272, 2024
12024
Neural Codec Language Models for Disentangled and Textless Voice Conversion
A Baade, P Peng, D Harwath
Proc. Interspeech 2024, 182-186, 2024
12024
Textless phrase structure induction from visually-grounded speech
CI Lai, F Shi, P Peng, Y Kim, K Gimpel, S Chang, YS Chuang, S Bhati, ...
12023
Integrating Self-supervised Speech Model with Pseudo Word-level Targets from Visually-grounded Speech Model
HC Fang, NX Ye, YJ Shih, P Peng, HF Wang, L Berry, H Lee, D Harwath
arXiv preprint arXiv:2402.05819, 2024
2024
The system can't perform the operation now. Try again later.
Articles 1–20