Skip to content

01Zhangbw/Awesome-Expressive-speech-synthesis

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

6 Commits
 
 

Repository files navigation

Awesome Expressive speech synthesis

To Start

Hi, there! This is a summary of Expressive speech synthesis papers! It may include some papers on song/audio generation.

If you have interest in our program, welcome to star⭐ or give some advice👏 (Pull Requests/Email📧 me)!

Latest update: 16, Jan, 2025

Expressive speech synthesis

Title Date Venue
Speech Synthesis along Perceptual Voice Quality Dimensions 15 January, 2025 ICASSP 2025
Retrieval-Augmented Dialogue Knowledge Aggregation for Expressive Conversational Speech Synthesis 11 January, 2025 Information Fusion 2025
PROEMO: Prompt-Driven Text-to-Speech Synthesis Based on Emotion and Intensity Control 10 January, 2025 ARXIV
DrawSpeech: Expressive Speech Synthesis Using Prosodic Sketches as Control Conditions 7 January, 2025 ICASSP25
FaceSpeak: Expressive and High-Quality Speech Synthesis from Human Portraits of Different Styles 1 January, 2025 ARXIV
Intra- and Inter-modal Context Interaction Modeling for Conversational Speech Synthesis 24 December, 2024 ICASSP 2025
Simi-SFX: A similarity-based conditioning method for controllable sound effect synthesis 24 December, 2024 ARXIV
FADA: Fast Diffusion Avatar Synthesis with Mixed-Supervised Multi-CFG Distillation 22 December, 2024 ARXIV
Hierarchical Control of Emotion Rendering in Speech Synthesis 16 December, 2024 Submitted to IEEE Transactions
AMuSeD: An Attentive Deep Neural Network for Multimodal Sarcasm Detection Incorporating Bi-modal Data Augmentation 13 December, 2024 submitted and under review at the IEEE Transactions on Affective Computing
CSSinger: End-to-End Chunkwise Streaming Singing Voice Synthesis System Based on Conditional Variational Autoencoder 13 December, 2024 AAAI2025
EmoSpeech: A Corpus of Emotionally Rich and Contextually Detailed Speech Annotations 12 December, 2024 ARXIV
Fish-Speech: Leveraging Large Language Models for Advanced Multilingual Text-to-Speech Synthesis 9 November, 2024 ARXIV
Making Social Platforms Accessible: Emotion-Aware Speech Generation with Integrated Text Analysis 24 October, 2024 ARXIV
Annotation-Free MIDI-to-Audio Synthesis via Concatenative Synthesis and Generative Refinement 22 October, 2024 ARXIV
Continuous Speech Synthesis using per-token Latent Diffusion 21 October, 2024 ARXIV
DART: Disentanglement of Accent and Speaker Representation in Multispeaker Text-to-Speech 17 October, 2024 ARXIV
DurIAN-E 2: Duration Informed Attention Network with Adaptive Variational Autoencoder and Adversarial Learning for Expressive Text-to-Speech Synthesis 17 October, 2024 ICASSP2024
SiFiSinger: A High-Fidelity End-to-End Singing Voice Synthesizer based on Source-filter Model 16 October, 2024 ICASSP2024
MultiVerse: Efficient and Expressive Zero-Shot Multi-Task Text-to-Speech 4 October, 2024 EMNLP 2024 Findings
EmoKnob: Enhance Voice Cloning with Fine-Grained Emotion Control 30 September, 2024 EMNLP 2024 Main
EmoPro: A Prompt Selection Strategy for Emotional Expression in LM-based Speech Synthesis 27 September, 2024 ARXIV
Facial Expression-Enhanced TTS: Combining Face Representation and Emotion Intensity for Adaptive Speech 24 September, 2024 ECCV Workshop ABAW(Affective Behavior Analysis in-the-wild)7 (to be appear)
ViolinDiff: Enhancing Expressive Violin Synthesis with Pitch Bend Conditioning 19 September, 2024 ARXIV
What happens to diffusion model likelihood when your model is conditional? 10 September, 2024 ARXIV
VoxInstruct: Expressive Human Instruction-to-Speech Generation with Unified Multilingual Codec Language Modelling 28 August, 2024 ACM Multimedia 2024
SpeechCraft: A Fine-grained Expressive Speech Dataset with Natural Language Description 24 August, 2024 ACM Multimedia 2024
Hierarchical Generative Modeling of Melodic Vocal Contours in Hindustani Classical Music 26 August, 2024 International Society for Music Information Retrieval (ISMIR) 2024
Generative Expressive Conversational Speech Synthesis 31 July, 2024 ACM MM 2024
Rasa: Building Expressive Speech Synthesis Systems for Indian Languages in Low-resource Settings 19 July, 2024 INTERSPEECH 2024
MSceneSpeech: A Multi-Scene Speech Dataset For Expressive Speech Synthesis 18 July, 2024 INTERSPEECH 2024
DEX-TTS: Diffusion-based EXpressive Text-to-Speech with Style Modeling on Time Variability 27 June, 2024 Preprint
A Study on Synthesizing Expressive Violin Performances: Approaches and Comparisons 26 June, 2024 ARXIV
GTR-Voice: Articulatory Phonetics Informed Controllable Expressive Speech Synthesis 15 June, 2024 ARXIV
VISinger2+: End-to-End Singing Voice Synthesis Augmented by Self-Supervised Learning Representation 12 June, 2024 SLT 2024
TokSing: Singing Voice Synthesis based on Discrete Tokens 12 June, 2024 Interspeech 2024
Emotional Conversation: Empowering Talking Faces with Cohesive Expression, Gaze and Pose Generation 12 June, 2024 ARXIV
Towards Expressive Zero-Shot Speech Synthesis with Hierarchical Prosody Modeling 9 June, 2024 Interspeech2024
Text-aware and Context-aware Expressive Audiobook Speech Synthesis 9 June, 2024 INTERSPEECH2024
Boosting Diffusion Model for Spectrogram Up-sampling in Text-to-speech: An Empirical Study 7 June, 2024 ARXIV
Style Mixture of Experts for Expressive Text-To-Speech Synthesis 5 June, 2024 NeurIPS 2024 Workshop
RSET: Remapping-based Sorting Method for Emotion Transfer Speech Synthesis 27 May, 2024 8th APWeb-WAIM International Joint Conference on Web and Big Data
Expressivity and Speech Synthesis 30 April, 2024 ARXIV
MM-TTS: A Unified Framework for Multimodal, Prompt-Induced Emotional Text-to-Speech Synthesis 28 April, 2024 ARXIV
Llama-VITS: Enhancing TTS Synthesis with Semantic Awareness 17 April, 2024 LREC-COLING 2024
Fine-Grained Quantitative Emotion Editing for Speech Generation 4 March, 2024 IEEE APSIPA ASC 2024
Expressive Acoustic Guitar Sound Synthesis with an Instrument-Specific Input Representation and Diffusion Outpainting 24 January, 2024 ICASSP 2024
StyleSpeech: Self-supervised Style Enhancing with VQ-VAE-based Pre-training for Expressive Audiobook Speech Synthesis 19 December, 2023 ICASSP 2024
Emotion Rendering for Conversational Speech Synthesis with Heterogeneous Graph-Based Context Modeling 19 December, 2023 AAAI'2024
MM-TTS: Multi-modal Prompt based Style Transfer for Expressive Text-to-Speech Synthesis 17 December, 2023 AAAI2024
SECap: Speech Emotion Captioning with Large Language Model 23 December, 2023 AAAI 2024
FaceTalk: Audio-Driven Motion Diffusion for Neural Parametric Head Models 13 December, 2023 CVPR2024

About

This is a summary of Expressive speech synthesis papers.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published