diff --git a/blogs/artificial-intelligence/MusicGen/LICENSE.txt b/blogs/artificial-intelligence/MusicGen/LICENSE.txt new file mode 100644 index 0000000..cbcee8a --- /dev/null +++ b/blogs/artificial-intelligence/MusicGen/LICENSE.txt @@ -0,0 +1,373 @@ +Copyright (c) 2024 Advanced Micro Devices, Inc. + +=========================================================================== + +All files in this directory exclusive of files in src and data folders +are governed by the following terms: + +Files in data/ folder and its subdirectories are governed by +the following terms: + +Creative Commons Attribution 4.0 International Public License + +By exercising the Licensed Rights (defined below), You accept and agree +to be bound by the terms and conditions of this Creative Commons +Attribution 4.0 International Public License ("Public License"). To the +extent this Public License may be interpreted as a contract, You are +granted the Licensed Rights in consideration of Your acceptance of +these terms and conditions, and the Licensor grants You such rights in +consideration of benefits the Licensor receives from making the +Licensed Material available under these terms and conditions. + + +Section 1 -- Definitions. + + a. Adapted Material means material subject to Copyright and Similar + Rights that is derived from or based upon the Licensed Material + and in which the Licensed Material is translated, altered, + arranged, transformed, or otherwise modified in a manner requiring + permission under the Copyright and Similar Rights held by the + Licensor. For purposes of this Public License, where the Licensed + Material is a musical work, performance, or sound recording, + Adapted Material is always produced where the Licensed Material is + synched in timed relation with a moving image. + + b. Adapter's License means the license You apply to Your Copyright + and Similar Rights in Your contributions to Adapted Material in + accordance with the terms and conditions of this Public License. + + c. Copyright and Similar Rights means copyright and/or similar rights + closely related to copyright including, without limitation, + performance, broadcast, sound recording, and Sui Generis Database + Rights, without regard to how the rights are labeled or + categorized. For purposes of this Public License, the rights + specified in Section 2(b)(1)-(2) are not Copyright and Similar + Rights. + + d. Effective Technological Measures means those measures that, in the + absence of proper authority, may not be circumvented under laws + fulfilling obligations under Article 11 of the WIPO Copyright + Treaty adopted on December 20, 1996, and/or similar international + agreements. + + e. Exceptions and Limitations means fair use, fair dealing, and/or + any other exception or limitation to Copyright and Similar Rights + that applies to Your use of the Licensed Material. + + f. Licensed Material means the artistic or literary work, database, + or other material to which the Licensor applied this Public + License. + + g. Licensed Rights means the rights granted to You subject to the + terms and conditions of this Public License, which are limited to + all Copyright and Similar Rights that apply to Your use of the + Licensed Material and that the Licensor has authority to license. + + h. Licensor means the individual(s) or entity(ies) granting rights + under this Public License. + + i. Share means to provide material to the public by any means or + process that requires permission under the Licensed Rights, such + as reproduction, public display, public performance, distribution, + dissemination, communication, or importation, and to make material + available to the public including in ways that members of the + public may access the material from a place and at a time + individually chosen by them. + + j. Sui Generis Database Rights means rights other than copyright + resulting from Directive 96/9/EC of the European Parliament and of + the Council of 11 March 1996 on the legal protection of databases, + as amended and/or succeeded, as well as other essentially + equivalent rights anywhere in the world. + + k. You means the individual or entity exercising the Licensed Rights + under this Public License. Your has a corresponding meaning. + + +Section 2 -- Scope. + + a. License grant. + + 1. Subject to the terms and conditions of this Public License, + the Licensor hereby grants You a worldwide, royalty-free, + non-sublicensable, non-exclusive, irrevocable license to + exercise the Licensed Rights in the Licensed Material to: + + a. reproduce and Share the Licensed Material, in whole or + in part; and + + b. produce, reproduce, and Share Adapted Material. + + 2. Exceptions and Limitations. For the avoidance of doubt, where + Exceptions and Limitations apply to Your use, this Public + License does not apply, and You do not need to comply with + its terms and conditions. + + 3. Term. The term of this Public License is specified in Section + 6(a). + + 4. Media and formats; technical modifications allowed. The + Licensor authorizes You to exercise the Licensed Rights in + all media and formats whether now known or hereafter created, + and to make technical modifications necessary to do so. The + Licensor waives and/or agrees not to assert any right or + authority to forbid You from making technical modifications + necessary to exercise the Licensed Rights, including + technical modifications necessary to circumvent Effective + Technological Measures. For purposes of this Public License, + simply making modifications authorized by this Section 2(a) + (4) never produces Adapted Material. + + 5. Downstream recipients. + + a. Offer from the Licensor -- Licensed Material. Every + recipient of the Licensed Material automatically + receives an offer from the Licensor to exercise the + Licensed Rights under the terms and conditions of this + Public License. + + b. No downstream restrictions. You may not offer or impose + any additional or different terms or conditions on, or + apply any Effective Technological Measures to, the + Licensed Material if doing so restricts exercise of the + Licensed Rights by any recipient of the Licensed + Material. + + 6. No endorsement. Nothing in this Public License constitutes or + may be construed as permission to assert or imply that You + are, or that Your use of the Licensed Material is, connected + with, or sponsored, endorsed, or granted official status by, + the Licensor or others designated to receive attribution as + provided in Section 3(a)(1)(A)(i). + + b. Other rights. + + 1. Moral rights, such as the right of integrity, are not + licensed under this Public License, nor are publicity, + privacy, and/or other similar personality rights; however, to + the extent possible, the Licensor waives and/or agrees not to + assert any such rights held by the Licensor to the limited + extent necessary to allow You to exercise the Licensed + Rights, but not otherwise. + + 2. Patent and trademark rights are not licensed under this + Public License. + + 3. To the extent possible, the Licensor waives any right to + collect royalties from You for the exercise of the Licensed + Rights, whether directly or through a collecting society + under any voluntary or waivable statutory or compulsory + licensing scheme. In all other cases the Licensor expressly + reserves any right to collect such royalties. + + +Section 3 -- License Conditions. + +Your exercise of the Licensed Rights is expressly made subject to the +following conditions. + + a. Attribution. + + 1. If You Share the Licensed Material (including in modified + form), You must: + + a. retain the following if it is supplied by the Licensor + with the Licensed Material: + + i. identification of the creator(s) of the Licensed + Material and any others designated to receive + attribution, in any reasonable manner requested by + the Licensor (including by pseudonym if + designated); + + ii. a copyright notice; + + iii. a notice that refers to this Public License; + + iv. a notice that refers to the disclaimer of + warranties; + + v. a URI or hyperlink to the Licensed Material to the + extent reasonably practicable; + + b. indicate if You modified the Licensed Material and + retain an indication of any previous modifications; and + + c. indicate the Licensed Material is licensed under this + Public License, and include the text of, or the URI or + hyperlink to, this Public License. + + 2. You may satisfy the conditions in Section 3(a)(1) in any + reasonable manner based on the medium, means, and context in + which You Share the Licensed Material. For example, it may be + reasonable to satisfy the conditions by providing a URI or + hyperlink to a resource that includes the required + information. + + 3. If requested by the Licensor, You must remove any of the + information required by Section 3(a)(1)(A) to the extent + reasonably practicable. + + 4. If You Share Adapted Material You produce, the Adapter's + License You apply must not prevent recipients of the Adapted + Material from complying with this Public License. + + +Section 4 -- Sui Generis Database Rights. + +Where the Licensed Rights include Sui Generis Database Rights that +apply to Your use of the Licensed Material: + + a. for the avoidance of doubt, Section 2(a)(1) grants You the right + to extract, reuse, reproduce, and Share all or a substantial + portion of the contents of the database; + + b. if You include all or a substantial portion of the database + contents in a database in which You have Sui Generis Database + Rights, then the database in which You have Sui Generis Database + Rights (but not its individual contents) is Adapted Material; and + + c. You must comply with the conditions in Section 3(a) if You Share + all or a substantial portion of the contents of the database. + +For the avoidance of doubt, this Section 4 supplements and does not +replace Your obligations under this Public License where the Licensed +Rights include other Copyright and Similar Rights. + + +Section 5 -- Disclaimer of Warranties and Limitation of Liability. + + a. UNLESS OTHERWISE SEPARATELY UNDERTAKEN BY THE LICENSOR, TO THE + EXTENT POSSIBLE, THE LICENSOR OFFERS THE LICENSED MATERIAL AS-IS + AND AS-AVAILABLE, AND MAKES NO REPRESENTATIONS OR WARRANTIES OF + ANY KIND CONCERNING THE LICENSED MATERIAL, WHETHER EXPRESS, + IMPLIED, STATUTORY, OR OTHER. THIS INCLUDES, WITHOUT LIMITATION, + WARRANTIES OF TITLE, MERCHANTABILITY, FITNESS FOR A PARTICULAR + PURPOSE, NON-INFRINGEMENT, ABSENCE OF LATENT OR OTHER DEFECTS, + ACCURACY, OR THE PRESENCE OR ABSENCE OF ERRORS, WHETHER OR NOT + KNOWN OR DISCOVERABLE. WHERE DISCLAIMERS OF WARRANTIES ARE NOT + ALLOWED IN FULL OR IN PART, THIS DISCLAIMER MAY NOT APPLY TO YOU. + + b. TO THE EXTENT POSSIBLE, IN NO EVENT WILL THE LICENSOR BE LIABLE + TO YOU ON ANY LEGAL THEORY (INCLUDING, WITHOUT LIMITATION, + NEGLIGENCE) OR OTHERWISE FOR ANY DIRECT, SPECIAL, INDIRECT, + INCIDENTAL, CONSEQUENTIAL, PUNITIVE, EXEMPLARY, OR OTHER LOSSES, + COSTS, EXPENSES, OR DAMAGES ARISING OUT OF THIS PUBLIC LICENSE OR + USE OF THE LICENSED MATERIAL, EVEN IF THE LICENSOR HAS BEEN + ADVISED OF THE POSSIBILITY OF SUCH LOSSES, COSTS, EXPENSES, OR + DAMAGES. WHERE A LIMITATION OF LIABILITY IS NOT ALLOWED IN FULL OR + IN PART, THIS LIMITATION MAY NOT APPLY TO YOU. + + c. The disclaimer of warranties and limitation of liability provided + above shall be interpreted in a manner that, to the extent + possible, most closely approximates an absolute disclaimer and + waiver of all liability. + + +Section 6 -- Term and Termination. + + a. This Public License applies for the term of the Copyright and + Similar Rights licensed here. However, if You fail to comply with + this Public License, then Your rights under this Public License + terminate automatically. + + b. Where Your right to use the Licensed Material has terminated under + Section 6(a), it reinstates: + + 1. automatically as of the date the violation is cured, provided + it is cured within 30 days of Your discovery of the + violation; or + + 2. upon express reinstatement by the Licensor. + + For the avoidance of doubt, this Section 6(b) does not affect any + right the Licensor may have to seek remedies for Your violations + of this Public License. + + c. For the avoidance of doubt, the Licensor may also offer the + Licensed Material under separate terms or conditions or stop + distributing the Licensed Material at any time; however, doing so + will not terminate this Public License. + + d. Sections 1, 5, 6, 7, and 8 survive termination of this Public + License. + + +Section 7 -- Other Terms and Conditions. + + a. The Licensor shall not be bound by any additional or different + terms or conditions communicated by You unless expressly agreed. + + b. Any arrangements, understandings, or agreements regarding the + Licensed Material not stated herein are separate from and + independent of the terms and conditions of this Public License. + + +Section 8 -- Interpretation. + + a. For the avoidance of doubt, this Public License does not, and + shall not be interpreted to, reduce, limit, restrict, or impose + conditions on any use of the Licensed Material that could lawfully + be made without permission under this Public License. + + b. To the extent possible, if any provision of this Public License is + deemed unenforceable, it shall be automatically reformed to the + minimum extent necessary to make it enforceable. If the provision + cannot be reformed, it shall be severed from this Public License + without affecting the enforceability of the remaining terms and + conditions. + + c. No term or condition of this Public License will be waived and no + failure to comply consented to unless expressly agreed to by the + Licensor. + + d. Nothing in this Public License constitutes or may be interpreted + as a limitation upon, or waiver of, any privileges and immunities + that apply to the Licensor or You, including from the legal + processes of any jurisdiction or authority. + + +======================================================================= + +Creative Commons is not a party to its public +licenses. Notwithstanding, Creative Commons may elect to apply one of +its public licenses to material it publishes and in those instances +will be considered the “Licensor.” The text of the Creative Commons +public licenses is dedicated to the public domain under the CC0 Public +Domain Dedication. Except for the limited purpose of indicating that +material is shared under a Creative Commons public license or as +otherwise permitted by the Creative Commons policies published at +creativecommons.org/policies, Creative Commons does not authorize the +use of the trademark "Creative Commons" or any other trademark or logo +of Creative Commons without its prior written consent including, +without limitation, in connection with any unauthorized modifications +to any of its public licenses or any other arrangements, +understandings, or agreements concerning use of licensed material. For +the avoidance of doubt, this paragraph does not form part of the +public licenses. + +Creative Commons may be contacted at creativecommons.org. + +=========================================================================== + +Files in src/ and data/ folders and its subdirectories are governed by +the following terms: + +Permission is hereby granted, free of charge, to any person obtaining a copy +of this software and associated documentation files (the "Software"), to deal +in the Software without restriction, including without limitation the rights +to use, copy, modify, merge, publish, distribute, sublicense, and/or sell +copies of the Software, and to permit persons to whom the Software is +furnished to do so, subject to the following conditions: + +The above copyright notice and this permission notice shall be included in +all copies or substantial portions of the Software. + +THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR +IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, +FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE +AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER +LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, +OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN +THE SOFTWARE. + diff --git a/blogs/artificial-intelligence/MusicGen/README.md b/blogs/artificial-intelligence/MusicGen/README.md new file mode 100644 index 0000000..154cb88 --- /dev/null +++ b/blogs/artificial-intelligence/MusicGen/README.md @@ -0,0 +1,227 @@ +--- +blogpost: true +date: 8 March 2024 +author: Phillip Dang +tags: PyTorch, AI/ML, Tuning +category: Applications & models +language: English +--- +
+ + + + + + +# Music Generation With MusicGen on an AMD GPU + +MusicGen is an autoregressive, transformer-based model that predicts the next segment of a piece of +music based on previous segments. This is a similar approach to language models predicting the next +token. + +MusicGen is able to generate music using the following as input: + +* No input sources (e.g., unconditional generation) +* A text description (e.g., text conditional generation) +* An input music sequence (e.g., melody conditional generation) + +For a deeper dive into the inner workings of MusicGen, refer to +[Simple and Controllable Music Generation](https://arxiv.org/abs/2306.05284). + +In this blog, we demonstrate how to seamlessly run inference on MusicGen using AMD GPUs and +ROCm. We use [this model from Hugging Face](https://huggingface.co/spaces/facebook/MusicGen) +with the three preceding inputs. + +## Prerequisites + +To run MusicGen locally, you need at least one GPU. To follow along with this blog, you must have the +following software: + +* [ROCm](https://rocm.docs.amd.com/projects/install-on-linux/en/latest/tutorial/quick-start.html) +* [PyTorch](https://rocm.docs.amd.com/projects/install-on-linux/en/latest/how-to/3rd-party/pytorch-install.html) +* Linux OS + +To check your hardware and ensure that your system recognizes your GPU, run: + +``` bash +rocm-smi --showproductname +``` + +Your output should look like this: + +```bash +================= ROCm System Management Interface ================ +========================= Product Info ============================ +GPU[0] : Card series: Instinct MI210 +GPU[0] : Card model: 0x0c34 +GPU[0] : Card vendor: Advanced Micro Devices, Inc. [AMD/ATI] +GPU[0] : Card SKU: D67301 +=================================================================== +===================== End of ROCm SMI Log ========================= +``` + +To make sure PyTorch recognizes your GPU, run: + +```python +import torch +print(f"number of GPUs: {torch.cuda.device_count()}") +print([torch.cuda.get_device_name(i) for i in range(torch.cuda.device_count())]) +``` + +Your output should look similar to this: + +```python +number of GPUs: 1 +['AMD Radeon Graphics'] +``` + +Once you've confirmed that your system recognizes your device(s), you're ready to install the required +libraries and generate some music. + +In this blog, we use the `facebook/musicgen-small` variant. + +### Libraries + +You can use MusicGen with Hugging Face's transformer. To install the required libraries, run the following commands: + +```python +! pip install -q transformers +``` + +## MusicGen with Hugging Face + +MusicGen is available in the Hugging Face Transformers library from version 4.31.0 onwards. Let's take a look at how to use it. We will be following [Hugging Face's demo](https://huggingface.co/docs/transformers/model_doc/musicgen) in this section. We will generate music in the 3 different modes explained in the introduction. + +### Unconditional generation + +Let's start by generating music without any input. + +```python +from transformers import MusicgenForConditionalGeneration + +# initialize model and model's input +model = MusicgenForConditionalGeneration.from_pretrained("facebook/musicgen-small") +unconditional_inputs = model.get_unconditional_inputs(num_samples=1) + +# generate audio +audio_values = model.generate(**unconditional_inputs, do_sample=True, max_new_tokens=256) +``` + +You can either listen to the audio directly in your notebook or save the audio as a WAV file using +**scipy**. + +* To listen in your notebook, run: + + ```python + from IPython.display import Audio + + sampling_rate = model.config.audio_encoder.sampling_rate + + # listen to our audio sample + Audio(audio_values[0].cpu(), rate=sampling_rate) + ``` + +* To save the audio, run + + ```python + import scipy + + sampling_rate = model.config.audio_encoder.sampling_rate + scipy.io.wavfile.write("audio/unconditional.wav", rate=sampling_rate, data=audio_values[0, 0].cpu().numpy()) + ``` + +### Text-conditional generation + +Next, let's generate music conditioned on our text input. This process has three steps: + +1. Text descriptions are passed through a text encoder model to obtain a sequence of hidden-state + representations. +2. MusicGen is trained to predict audio tokens, or audio codes, conditioned on these hidden-states. +3. Audio tokens are decoded using an audio compression model, such as EnCodec, to recover the + audio waveform. + +To see this in action, run: + +```python +from transformers import AutoProcessor, MusicgenForConditionalGeneration + +# Initialize model +processor = AutoProcessor.from_pretrained("facebook/musicgen-small") +model = MusicgenForConditionalGeneration.from_pretrained("facebook/musicgen-small") + +# Set device to GPU +device = 'cuda' +model = model.to(device) + +# Text description for the model +input_text = ["epic movie theme", "sad jazz"] + +# Create input +inputs = processor( + text=input_text, + padding=True, + return_tensors="pt", +).to(device) + +# Generate audio +audio_values_from_text = model.generate(**inputs, max_new_tokens=512) + +print(audio_values_from_text.shape) +``` + +```python +torch.Size([2, 1, 325760]) +``` + +Note that the audio outputs are a three-dimensional Torch tensor of shape `batch_size`, +`num_channels`, and `sequence_length`. As with unconditional generation, you can listen to your +generated audio via the Audio library: + +```python +from IPython.display import Audio + +sampling_rate = model.config.audio_encoder.sampling_rate + +# Listen to your first audio sample from input text "epic music theme" +Audio(audio_values_from_text[0].cpu(), rate=sampling_rate) + +# Listen to your second audio sample from input text "sad jazz" +Audio(audio_values_from_text[1].cpu(), rate=sampling_rate) +``` + +We saved our versions of these two WAV files as `audio/conditional1.wav` and +`audio/conditional2.wav` in [this GitHub folder](https://github.com/ROCm/rocm-blogs/tree/release/blogs/artificial-intelligence/MusicGen/audio), so you can listen to them without having to run the code. + +### Audio-prompted generation + +You can also generate music by providing a melody and a text description to guide the generative +process. Let's take the first half of the sample we previously generated from our text description +"sad jazz" and use it as our audio prompt: + +```python +# take the first half of the generated audio +sample = audio_values_from_text[1][0].cpu().numpy() +sample = sample[: len(sample) // 2] + +# use it as input +inputs = processor( + audio=sample, + sampling_rate=sampling_rate, + text=["sad jazz"], + padding=True, + return_tensors="pt", +).to(device) +audio_values = model.generate(**inputs, do_sample=True, guidance_scale=3, max_new_tokens=256) +``` + +You can listen to the audio using: + +```python +Audio(audio_values[0].cpu(), rate=sampling_rate) +``` + +We saved this under `audio/audio_prompted.wav` in [this GitHub folder](https://github.com/ROCm/rocm-blogs/tree/release/blogs/artificial-intelligence/MusicGen/audio). + +While we only used the small model in this blog, we encourage you to explore the medium and +large models. We also to experiment with fine-tuning the model using your own custom audio +dataset. diff --git a/blogs/artificial-intelligence/MusicGen/audio/audio_prompted.wav b/blogs/artificial-intelligence/MusicGen/audio/audio_prompted.wav new file mode 100644 index 0000000..7d30852 Binary files /dev/null and b/blogs/artificial-intelligence/MusicGen/audio/audio_prompted.wav differ diff --git a/blogs/artificial-intelligence/MusicGen/audio/conditional1.wav b/blogs/artificial-intelligence/MusicGen/audio/conditional1.wav new file mode 100644 index 0000000..e0c05f2 Binary files /dev/null and b/blogs/artificial-intelligence/MusicGen/audio/conditional1.wav differ diff --git a/blogs/artificial-intelligence/MusicGen/audio/conditional2.wav b/blogs/artificial-intelligence/MusicGen/audio/conditional2.wav new file mode 100644 index 0000000..a6c8ad1 Binary files /dev/null and b/blogs/artificial-intelligence/MusicGen/audio/conditional2.wav differ diff --git a/blogs/artificial-intelligence/MusicGen/audio/unconditional.wav b/blogs/artificial-intelligence/MusicGen/audio/unconditional.wav new file mode 100644 index 0000000..66e5a18 Binary files /dev/null and b/blogs/artificial-intelligence/MusicGen/audio/unconditional.wav differ diff --git a/blogs/artificial-intelligence/MusicGen/src/musicgen.ipynb b/blogs/artificial-intelligence/MusicGen/src/musicgen.ipynb new file mode 100644 index 0000000..bebb55b --- /dev/null +++ b/blogs/artificial-intelligence/MusicGen/src/musicgen.ipynb @@ -0,0 +1,157 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Huggingface Unconditional generation" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "from transformers import MusicgenForConditionalGeneration\n", + "\n", + "# initialize model and model's input\n", + "model = MusicgenForConditionalGeneration.from_pretrained(\"facebook/musicgen-small\")\n", + "unconditional_inputs = model.get_unconditional_inputs(num_samples=1)\n", + "\n", + "# generate audio\n", + "audio_values = model.generate(**unconditional_inputs, do_sample=True, max_new_tokens=256)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "from IPython.display import Audio\n", + "\n", + "sampling_rate = model.config.audio_encoder.sampling_rate\n", + "\n", + "# listen to our audio sample\n", + "Audio(audio_values[0].cpu(), rate=sampling_rate)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Huggingface Text-conditional generation" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "from transformers import AutoProcessor, MusicgenForConditionalGeneration\n", + "\n", + "# initialize model\n", + "processor = AutoProcessor.from_pretrained(\"facebook/musicgen-small\")\n", + "model = MusicgenForConditionalGeneration.from_pretrained(\"facebook/musicgen-small\")\n", + "\n", + "# set device to GPU\n", + "device = 'cuda'\n", + "model = model.to(device)\n", + "\n", + "# our text description for the model\n", + "input_text = [\"epic movie theme\", \"sad jazz\"]\n", + "\n", + "# create input\n", + "inputs = processor(\n", + " text=input_text,\n", + " padding=True,\n", + " return_tensors=\"pt\",\n", + ").to(device)\n", + "\n", + "# generate audio\n", + "audio_values_from_text = model.generate(**inputs, max_new_tokens=512)\n", + "\n", + "print(audio_values_from_text.shape)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "from IPython.display import Audio\n", + "\n", + "sampling_rate = model.config.audio_encoder.sampling_rate\n", + "\n", + "# listen to our first audio sample from input text \"epic music theme\"\n", + "Audio(audio_values_from_text[0].cpu(), rate=sampling_rate)\n", + "\n", + "# listen to our second audio sample from input text \"sad jazz\"\n", + "Audio(audio_values_from_text[1].cpu(), rate=sampling_rate)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Huggingface Audio-prompted generation" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# take the first half of the generated audio\n", + "sample = audio_values_from_text[1][0].cpu().numpy()\n", + "sample = sample[: len(sample) // 2]\n", + "\n", + "# use it as input\n", + "inputs = processor(\n", + " audio=sample,\n", + " sampling_rate=sampling_rate,\n", + " text=[\"sad jazz\"],\n", + " padding=True,\n", + " return_tensors=\"pt\",\n", + ").to(device)\n", + "audio_values = model.generate(**inputs, do_sample=True, guidance_scale=3, max_new_tokens=256)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "Audio(audio_values[0].cpu(), rate=sampling_rate)" + ] + } + ], + "metadata": { + "colab": { + "provenance": [] + }, + "kernelspec": { + "display_name": "Python 3", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.7.9" + } + }, + "nbformat": 4, + "nbformat_minor": 0 +} diff --git a/blogs/index.md b/blogs/index.md index 288485c..cb5456d 100644 --- a/blogs/index.md +++ b/blogs/index.md @@ -59,8 +59,6 @@ Performance benchmarking across various AMD GPUs and cache size limitations ::: -::::{grid} 3 -:margin: 1 ::::