Skip to content

Latest commit

 

History

History
118 lines (103 loc) · 3.91 KB

README.md

File metadata and controls

118 lines (103 loc) · 3.91 KB

Invisible Watermarking For Audio Generation Diffusion Models

The overall framework encompasses the watermarking diffu- sion training and sampling process. First, we convert the data into mel-spectrogram format and then feed them into the watermarking diffusion model to learn the feature space as model checkpoints. When we input a noise image into these model checkpoints, we obtain three distinct generations based on whether different triggers are presented with inputs or not. Built on previous work, thank all contributions. link

Requirement packages

# conda
conda install --file requirement.txt

# pip 
pip install --file requirement.txt

Prepare Dataset

download raw audio dataset

python utils/prepare_sc.py

mel-spectrogram convertion (the following code automatically setup dataset for training)

python utils/audio_conversion.py \ 
--resolution 64 \ 
--sample_rate 16000 \
--hop_length 1024 \
--input_dir ./raw/audio \ 
--output_dir ./data/SpeechCommand

directory tree (show structure for straightforward understanding)

watermark-audio-diffusion/
├── configs/
├── ...
├── main.py
├── vanilla.py
├── data/
│   ├── SpeechCommand/
│   │   ├── val/
│   │   ├── test/
│   │   ├── train/
│   │   │   ├── class_1
│   │   │   ├── class_2
│   │   │   └── ...
│   ├── out_class/
│   │   ├── test/
│   │   └── train/
├── raw/
│   ├── audio/
│   ├── npy/
│   ├── speech_command_v2/
│   └── .gz

⍾ Train

1) In-Distribution Watermark

# (blend) dataset name has to be the same as the one that store inside directory ./data
python main.py --dataset SpeechCommand --config sc_64.yml --ni --gamma 0.6 --target_label 6

# (patch) --miu_path is where you trigger located
python main.py --dataset SpeechCommand --config sc_64.yml --ni --gamma 0.1 --trigger_type patch --miu_path './images/white.png' --patch_size 3

2) Out-of-Distribution Watermark

# (blend) dataset name has to be out_class, put the out-distr class inside (directory tree)
python main.py --dataset out_class --config sc_64.yml --ni --gamma 0.6 --watermark d2dout 

3) Instance-Specific Watermark

# (blend) --watermark argument specify watermarking type (d2din, d2dout, d2i)
python main.py --dataset SpeechCommand --config sc_64.yml --ni --gamma 0.6 --watermark d2i

(optional) Vanilla Diffusion Model

python vanilla.py --doc vanilla_sc64 --config sc_64.yml --ni 

⍾ Sample | Generation

DDPM Schedule

# (blend)
python main.py --dataset SpeechCommand --config sc_64.yml --ni --sample --sample_type ddpm_noisy --fid --timesteps 1000 --eta 1 --gamma 0.6 --watermark d2din

DDIM Schedule

# (blend)
python main.py --dataset SpeechCommand --config sc_64.yml --ni --sample --fid --timesteps 100 --eta 0 --gamma 0.6 --skip_type 'quad' --watermark d2din

⍾ Evaluation

Train Classifier using ResNeXt model architecture for FID and WSR

# train 
python train_speech_commands.py

# test
python test_speech_commands.py

SNR, PSNR and SSIM please refer to eval directory

⍾ Citation

@article{cao2023invisible,
  title={Invisible Watermarking for Audio Generation Diffusion Models},
  author={Cao, Xirong and Li, Xiang and Jadav, Divyesh and Wu, Yanzhao and Chen, Zhehui and Zeng, Chen and Wei, Wenqi},
  journal={arXiv preprint arXiv:2309.13166},
  year={2023}
}

🙏 Appreciation

The code is based on Trojan Diffusion. TrojDiff: Trojan Attacks on Diffusion Models with Diverse Targets, arXiv