Awesome Expressive speech synthesis

To Start

Hi, there! This is a summary of Expressive speech synthesis papers! It may include some papers on song/audio generation.

If you have interest in our program, welcome to star⭐ or give some advice👏 (Pull Requests/Email📧 me)!

Latest update: 16, Jan, 2025

Expressive speech synthesis

Title	Date	Venue
Speech Synthesis along Perceptual Voice Quality Dimensions	15 January, 2025	ICASSP 2025
Retrieval-Augmented Dialogue Knowledge Aggregation for Expressive Conversational Speech Synthesis	11 January, 2025	Information Fusion 2025
PROEMO: Prompt-Driven Text-to-Speech Synthesis Based on Emotion and Intensity Control	10 January, 2025	ARXIV
DrawSpeech: Expressive Speech Synthesis Using Prosodic Sketches as Control Conditions	7 January, 2025	ICASSP25
FaceSpeak: Expressive and High-Quality Speech Synthesis from Human Portraits of Different Styles	1 January, 2025	ARXIV
Intra- and Inter-modal Context Interaction Modeling for Conversational Speech Synthesis	24 December, 2024	ICASSP 2025
Simi-SFX: A similarity-based conditioning method for controllable sound effect synthesis	24 December, 2024	ARXIV
FADA: Fast Diffusion Avatar Synthesis with Mixed-Supervised Multi-CFG Distillation	22 December, 2024	ARXIV
Hierarchical Control of Emotion Rendering in Speech Synthesis	16 December, 2024	Submitted to IEEE Transactions
AMuSeD: An Attentive Deep Neural Network for Multimodal Sarcasm Detection Incorporating Bi-modal Data Augmentation	13 December, 2024	submitted and under review at the IEEE Transactions on Affective Computing
CSSinger: End-to-End Chunkwise Streaming Singing Voice Synthesis System Based on Conditional Variational Autoencoder	13 December, 2024	AAAI2025
EmoSpeech: A Corpus of Emotionally Rich and Contextually Detailed Speech Annotations	12 December, 2024	ARXIV
Fish-Speech: Leveraging Large Language Models for Advanced Multilingual Text-to-Speech Synthesis	9 November, 2024	ARXIV
Making Social Platforms Accessible: Emotion-Aware Speech Generation with Integrated Text Analysis	24 October, 2024	ARXIV
Annotation-Free MIDI-to-Audio Synthesis via Concatenative Synthesis and Generative Refinement	22 October, 2024	ARXIV
Continuous Speech Synthesis using per-token Latent Diffusion	21 October, 2024	ARXIV
DART: Disentanglement of Accent and Speaker Representation in Multispeaker Text-to-Speech	17 October, 2024	ARXIV
DurIAN-E 2: Duration Informed Attention Network with Adaptive Variational Autoencoder and Adversarial Learning for Expressive Text-to-Speech Synthesis	17 October, 2024	ICASSP2024
SiFiSinger: A High-Fidelity End-to-End Singing Voice Synthesizer based on Source-filter Model	16 October, 2024	ICASSP2024
MultiVerse: Efficient and Expressive Zero-Shot Multi-Task Text-to-Speech	4 October, 2024	EMNLP 2024 Findings
EmoKnob: Enhance Voice Cloning with Fine-Grained Emotion Control	30 September, 2024	EMNLP 2024 Main
EmoPro: A Prompt Selection Strategy for Emotional Expression in LM-based Speech Synthesis	27 September, 2024	ARXIV
Facial Expression-Enhanced TTS: Combining Face Representation and Emotion Intensity for Adaptive Speech	24 September, 2024	ECCV Workshop ABAW(Affective Behavior Analysis in-the-wild)7 (to be appear)
ViolinDiff: Enhancing Expressive Violin Synthesis with Pitch Bend Conditioning	19 September, 2024	ARXIV
What happens to diffusion model likelihood when your model is conditional?	10 September, 2024	ARXIV
VoxInstruct: Expressive Human Instruction-to-Speech Generation with Unified Multilingual Codec Language Modelling	28 August, 2024	ACM Multimedia 2024
SpeechCraft: A Fine-grained Expressive Speech Dataset with Natural Language Description	24 August, 2024	ACM Multimedia 2024
Hierarchical Generative Modeling of Melodic Vocal Contours in Hindustani Classical Music	26 August, 2024	International Society for Music Information Retrieval (ISMIR) 2024
Generative Expressive Conversational Speech Synthesis	31 July, 2024	ACM MM 2024
Rasa: Building Expressive Speech Synthesis Systems for Indian Languages in Low-resource Settings	19 July, 2024	INTERSPEECH 2024
MSceneSpeech: A Multi-Scene Speech Dataset For Expressive Speech Synthesis	18 July, 2024	INTERSPEECH 2024
DEX-TTS: Diffusion-based EXpressive Text-to-Speech with Style Modeling on Time Variability	27 June, 2024	Preprint
A Study on Synthesizing Expressive Violin Performances: Approaches and Comparisons	26 June, 2024	ARXIV
GTR-Voice: Articulatory Phonetics Informed Controllable Expressive Speech Synthesis	15 June, 2024	ARXIV
VISinger2+: End-to-End Singing Voice Synthesis Augmented by Self-Supervised Learning Representation	12 June, 2024	SLT 2024
TokSing: Singing Voice Synthesis based on Discrete Tokens	12 June, 2024	Interspeech 2024
Emotional Conversation: Empowering Talking Faces with Cohesive Expression, Gaze and Pose Generation	12 June, 2024	ARXIV
Towards Expressive Zero-Shot Speech Synthesis with Hierarchical Prosody Modeling	9 June, 2024	Interspeech2024
Text-aware and Context-aware Expressive Audiobook Speech Synthesis	9 June, 2024	INTERSPEECH2024
Boosting Diffusion Model for Spectrogram Up-sampling in Text-to-speech: An Empirical Study	7 June, 2024	ARXIV
Style Mixture of Experts for Expressive Text-To-Speech Synthesis	5 June, 2024	NeurIPS 2024 Workshop
RSET: Remapping-based Sorting Method for Emotion Transfer Speech Synthesis	27 May, 2024	8th APWeb-WAIM International Joint Conference on Web and Big Data
Expressivity and Speech Synthesis	30 April, 2024	ARXIV
MM-TTS: A Unified Framework for Multimodal, Prompt-Induced Emotional Text-to-Speech Synthesis	28 April, 2024	ARXIV
Llama-VITS: Enhancing TTS Synthesis with Semantic Awareness	17 April, 2024	LREC-COLING 2024
Fine-Grained Quantitative Emotion Editing for Speech Generation	4 March, 2024	IEEE APSIPA ASC 2024
Expressive Acoustic Guitar Sound Synthesis with an Instrument-Specific Input Representation and Diffusion Outpainting	24 January, 2024	ICASSP 2024
StyleSpeech: Self-supervised Style Enhancing with VQ-VAE-based Pre-training for Expressive Audiobook Speech Synthesis	19 December, 2023	ICASSP 2024
Emotion Rendering for Conversational Speech Synthesis with Heterogeneous Graph-Based Context Modeling	19 December, 2023	AAAI'2024
MM-TTS: Multi-modal Prompt based Style Transfer for Expressive Text-to-Speech Synthesis	17 December, 2023	AAAI2024
SECap: Speech Emotion Captioning with Large Language Model	23 December, 2023	AAAI 2024
FaceTalk: Audio-Driven Motion Diffusion for Neural Parametric Head Models	13 December, 2023	CVPR2024

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Awesome Expressive speech synthesis

To Start

Expressive speech synthesis

About

Releases

Packages

01Zhangbw/Awesome-Expressive-speech-synthesis

Folders and files

Latest commit

History

Repository files navigation

Awesome Expressive speech synthesis

To Start

Expressive speech synthesis

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Packages