This repository contains a simple implementation of a GPT (Generative Pre-trained Transformer) model using PyTorch. The GPT model is designed for natural language processing tasks such as text generation. In this project, we create a simplified version of GPT, train it on a small dataset, and generate text based on a given prompt.
- Introduction
- Features
- Installation
- Usage
- Model Architecture
- Training the Model
- Generating Text
- Contributing
- License
The GPT model is a type of transformer model used for generating human-like text. It has applications in various NLP tasks, such as text completion, translation, and summarization. This implementation demonstrates the basic structure and functionality of a GPT model.
- Custom dataset handling for text inputs
- Simplified GPT architecture with transformer blocks
- Training loop with loss calculation and optimization
- Text generation from a trained model
To run this code, you need to have Python 3.6 or higher installed. You also need the following libraries:
- torch
- transformers
You can install these libraries using pip:
pip install torch transformers
- Clone this repository:
git clone https://github.com/yourusername/simple-gpt.git
cd simple-gpt
- Run the script:
python understanding-gpt.py
The model consists of the following components:
- SimpleDataset: A custom dataset class to handle text inputs and tokenization.
- GPTBlock: A single transformer block that includes multi-head self-attention and a feed-forward neural network.
- SimpleGPT: The main GPT model class that stacks multiple GPTBlocks and includes token and position embeddings.
The train
function handles the training process, including the forward pass, loss calculation, backpropagation, and optimization.
def train(model, dataloader, optimizer, criterion, epochs=5, device='cuda'):
model.train()
for epoch in range(epochs):
total_loss = 0
for input_ids, attention_mask in dataloader:
input_ids, attention_mask = input_ids.to(device), attention_mask.to(device)
optimizer.zero_grad()
outputs = model(input_ids, attention_mask)
shift_logits = outputs[..., :-1, :].contiguous()
shift_labels = input_ids[..., 1:].contiguous()
loss = criterion(shift_logits.view(-1, shift_logits.size(-1)), shift_labels.view(-1))
loss.backward()
optimizer.step()
total_loss += loss.item()
print(f"Epoch {epoch + 1}/{epochs}, Loss: {total_loss / len(dataloader)}")
The generate_text
function allows you to generate text from a trained model given a prompt.
def generate_text(model, tokenizer, prompt, max_length=50, device='cuda'):
model.eval()
input_ids = tokenizer.encode(prompt, return_tensors='pt').to(device)
generated = input_ids
for _ in range(max_length):
outputs = model(generated)
next_token_logits = outputs[:, -1, :]
next_token = torch.argmax(next_token_logits, dim=-1).unsqueeze(0)
generated = torch.cat((generated, next_token), dim=1)
if next_token.item() == tokenizer.eos_token_id:
break
generated_text = tokenizer.decode(generated[0], skip_special_tokens=True)
return generated_text
Contributions are welcome! Please open an issue or submit a pull request for any improvements or bug fixes.
This project is licensed under the MIT License. See the LICENSE file for details.
Feel free to modify and expand this README to suit your project's needs.