Naturalspeech: End-to-end text-to-speech synthesis with human-level quality X Tan, J Chen, H Liu, J Cong, C Zhang, Y Liu, X Wang, Y Leng, Y Yi, L He, ... IEEE Transactions on Pattern Analysis and Machine Intelligence, 2024 | 135 | 2024 |
Visinger: Variational inference with adversarial learning for end-to-end singing voice synthesis Y Zhang, J Cong, H Xue, L Xie, P Zhu, M Bi ICASSP 2022, 2022 | 65 | 2022 |
Data efficient voice cloning from noisy samples with domain adversarial training J Cong, S Yang, L Xie, G Yu, G Wan INTERSPEECH 2020, 2020 | 32 | 2020 |
Controllable Context-aware Conversational Speech Synthesis J Cong, S Yang, N Hu, G Li, L Xie, D Su INTERSPEECH 2021, 2021 | 30 | 2021 |
Glow-wavegan: Learning speech representations from gan-based variational auto-encoder for high fidelity flow-based speech synthesis J Cong, S Yang, L Xie, D Su INTERSPEECH 2021, 2021 | 28 | 2021 |
Glow-WaveGAN 2: high-quality zero-shot text-to-speech synthesis and any-to-any voice conversion Y Lei, S Yang, J Cong, L Xie, D Su INTERSPEECH2022, 2022 | 16 | 2022 |
Dspgan: a gan-based universal vocoder for high-fidelity tts by time-frequency domain supervision from dsp K Song, Y Zhang, Y Lei, J Cong, H Li, L Xie, G He, J Bai ICASSP 2023-2023 IEEE International Conference on Acoustics, Speech and …, 2023 | 9 | 2023 |
AdaVITS: Tiny VITS for low computing resource speaker adaptation K Song, H Xue, X Wang, J Cong, Y Zhang, L Xie, B Yang, X Zhang, D Su 2022 13th International Symposium on Chinese Spoken Language Processing …, 2022 | 6 | 2022 |
DiCLET-TTS: Diffusion model based cross-lingual emotion transfer for text-to-speech—A study between English and Mandarin T Li, C Hu, J Cong, X Zhu, J Li, Q Tian, Y Wang, L Xie IEEE/ACM Transactions on Audio, Speech, and Language Processing, 2023 | 5 | 2023 |
Seed-TTS: A Family of High-Quality Versatile Speech Generation Models P Anastassiou, J Chen, J Chen, Y Chen, Z Chen, Z Chen, J Cong, L Deng, ... arXiv preprint arXiv:2406.02430, 2024 | 3 | 2024 |
Robust MelGAN: A robust universal neural vocoder for high-fidelity TTS K Song, J Cong, X Wang, Y Zhang, L Xie, N Jiang, H Wu 2022 13th International Symposium on Chinese Spoken Language Processing …, 2022 | 2 | 2022 |
U-Style: Cascading U-nets with Multi-level Speaker and Style Modeling for Zero-Shot Voice Cloning T Li, Z Wang, X Zhu, J Cong, Q Tian, Y Wang, L Xie arXiv preprint arXiv:2310.04004, 2023 | | 2023 |