[Issue] Triton Compilation Error in Unsloth Fine-Tuning Script on Kernel 5.4.0 #1336

gityeop · 2024-11-25T12:15:54Z

Description

When trying to run Unsloth fine-tuning script, encountering a Triton compilation error related to ReduceOpToLLVM.cpp.

Error Message

python /data/ephemeral/home/unsloth_example.py
🦥 Unsloth: Will patch your computer to enable 2x faster free finetuning.
🦥 Unsloth Zoo will now patch everything to make training faster!
==((====))==  Unsloth 2024.11.9: Fast Llama patching. Transformers = 4.46.3
.                                                                             \\   /|    GPU: Tesla V100-SXM2-32GB. Max memory: 31.739 GB. Platform = 
Linux.                                                                     O^O/ \_/ \    Pytorch: 2.4.1+cu121. CUDA = 7.0. CUDA Toolkit = 12.1.
\        /    Bfloat16 = FALSE. FA [Xformers = 0.0.28.post1. FA2 = False]
 "-____-"     Free Apache license: http://github.com/unslothai/unsloth
Unsloth: Fast downloading is enabled - ignore downloading bars which are re
d colored!                                                                 Unsloth 2024.11.9 patched 32 layers with 32 QKV layers, 32 O layers and 32 
MLP layers.                                                                Detected kernel version 5.4.0, which is below the recommended minimum of 5.
5.0; this can cause the process to hang. It is recommended to upgrade the kernel to the minimum version or higher.                                    max_steps is given, it will override any value given in num_train_epochs
==((====))==  Unsloth - 2x faster free finetuning | Num GPUs = 1
   \\   /|    Num examples = 210,289 | Num Epochs = 1
O^O/ \_/ \    Batch size per device = 2 | Gradient Accumulation steps = 4
\        /    Total batch size = 8 | Total steps = 60
 "-____-"     Number of trainable parameters = 41,943,040
  0%|                                               | 0/60 [00:00<?, ?it/s]
python: /project/lib/Conversion/TritonGPUToLLVM/ReduceOpToLLVM.cpp:31: virtual mlir::LogicalResult {anonymous}::ReduceOpConversion::matchAndRewrite(mlir::triton::ReduceOp, mlir::ConvertOpToLLVMPattern<mlir::triton::ReduceOp>::OpAdaptor, mlir::ConversionPatternRewriter&) const: Assertion `helper.isSupportedLayout() && "Unexpected srcLayout in ReduceOpConversion"' failed.   Aborted (core dumped)

System Information

OS Kernel: 5.4.0-99-generic
GPU: Tesla V100-SXM2-32GB
CUDA Version: 12.2
Driver Version: 535.161.08
GPU Memory: 32GB

Code

from unsloth import FastLanguageModel 
from unsloth import is_bfloat16_supported
import torch
from trl import SFTTrainer
from transformers import TrainingArguments
from datasets import load_dataset
max_seq_length = 2048 # Supports RoPE Scaling interally, so choose any!
# Get LAION dataset
url = "https://huggingface.co/datasets/laion/OIG/resolve/main/unified_chip2.jsonl"
dataset = load_dataset("json", data_files = {"train" : url}, split = "train")

# 4bit pre quantized models we support for 4x faster downloading + no OOMs.
fourbit_models = [
    "unsloth/mistral-7b-v0.3-bnb-4bit",      # New Mistral v3 2x faster!
    "unsloth/mistral-7b-instruct-v0.3-bnb-4bit",
    "unsloth/llama-3-8b-bnb-4bit",           # Llama-3 15 trillion tokens model 2x faster!
    "unsloth/llama-3-8b-Instruct-bnb-4bit",
    "unsloth/llama-3-70b-bnb-4bit",
    "unsloth/Phi-3-mini-4k-instruct",        # Phi-3 2x faster!
    "unsloth/Phi-3-medium-4k-instruct",
    "unsloth/mistral-7b-bnb-4bit",
    "unsloth/gemma-7b-bnb-4bit",             # Gemma 2.2x faster!
] # More models at https://huggingface.co/unsloth

model, tokenizer = FastLanguageModel.from_pretrained(
    model_name = "unsloth/llama-3-8b-bnb-4bit",
    max_seq_length = max_seq_length,
    dtype = None,
    load_in_4bit = True,
)

# Do model patching and add fast LoRA weights
model = FastLanguageModel.get_peft_model(
    model,
    r = 16,
    target_modules = ["q_proj", "k_proj", "v_proj", "o_proj",
                      "gate_proj", "up_proj", "down_proj",],
    lora_alpha = 16,
    lora_dropout = 0, # Supports any, but = 0 is optimized
    bias = "none",    # Supports any, but = "none" is optimized
    # [NEW] "unsloth" uses 30% less VRAM, fits 2x larger batch sizes!
    use_gradient_checkpointing = "unsloth", # True or "unsloth" for very long context
    random_state = 3407,
    max_seq_length = max_seq_length,
    use_rslora = False,  # We support rank stabilized LoRA
    loftq_config = None, # And LoftQ
)

trainer = SFTTrainer(
    model = model,
    train_dataset = dataset,
    dataset_text_field = "text",
    max_seq_length = max_seq_length,
    tokenizer = tokenizer,
    args = TrainingArguments(
        per_device_train_batch_size = 2,
        gradient_accumulation_steps = 4,
        warmup_steps = 10,
        max_steps = 60,
        fp16 = not is_bfloat16_supported(),
        bf16 = is_bfloat16_supported(),
        logging_steps = 1,
        output_dir = "outputs",
        optim = "adamw_8bit",
        seed = 3407,
    ),
)
trainer.train()

# Go to https://github.com/unslothai/unsloth/wiki for advanced tips like
# (1) Saving to GGUF / merging to 16bit for vLLM
# (2) Continued training from a saved LoRA adapter
# (3) Adding an evaluation loop / OOMs
# (4) Customized chat templates

Additional Context

The error occurs during model initialization
Kernel version (5.4.0) is below the recommended minimum of 5.5.0
Using Unsloth's pre-quantized 4-bit model
Attempting to run on a single GPU setup

Steps to Reproduce

Set up conda environment with PyTorch and CUDA
Install Unsloth
Run the example script for fine-tuning
Error occurs during model initialization phase

Questions

Is this error related to the kernel version being below the recommended minimum (5.4.0 < 5.5.0)?
Are there any specific version requirements or compatibility issues with Triton that need to be addressed?
Are there any workarounds available for systems that cannot upgrade their kernel version?

The text was updated successfully, but these errors were encountered:

chengju-zhou · 2024-11-25T14:50:33Z

same issue on V100. but it works fine on T4

ergosumdre · 2024-11-25T19:28:35Z

I also have a V100 and I'm getting this error too.

LiaoPan · 2024-11-26T03:04:43Z

I also encountered the same error on v100.

LiaoPan · 2024-11-26T03:31:13Z

I also encountered the same error on v100.

Temporary solution:

Perhaps change the version of triton, but it will raise some warnings.
$ pip install triton==2.3.0

wait for the final solution.

hykilpikonna · 2024-11-26T09:34:57Z

I also encountered the same error on v100.

Temporary solution:

Perhaps change the version of triton, but it will raise some warnings.
$ pip install triton==2.3.0

wait for the final solution.

Which torch version did you use? It seems that torch 2.5.1 isn't compatible

unsloth_env/lib/python3.11/site-packages/torch/_inductor/codecache.py", line 195, in get_system                                                                         from triton.compiler.compiler import triton_key                                                                                                                                             ImportError: cannot import name 'triton_key' from 'triton.compiler.compiler' (unsloth_env/lib/python3.11/site-packages/triton/compiler/compiler.py)```

LiaoPan · 2024-11-26T11:27:41Z

@hykilpikonna pytorh '2.4.0+cu121'

danielhanchen · 2024-11-26T11:31:05Z

Apologies everyone! @LiaoPan @hykilpikonna @ergosumdre @gityeop @chengju-zhou I added a flag to disable some other kernels - I'm unsure if it worked though.

Torch 2.5 and torch 2.4 should be now supported - sadly Colab got rid of V100s so I can't test them - so I'm assuming a specific kernel from Apple's Cut Cross Entropy package is the one causing the issues.

Please try updating Unsloth without dependencies if that works!

pip uninstall unsloth unsloth-zoo
pip install --upgrade --no-cache-dir --no-deps unsloth unsloth-zoo

danielhanchen · 2024-11-26T11:31:37Z

By the way to get Torch 2.4 - simply run wget -qO- https://raw.githubusercontent.com/unslothai/unsloth/main/unsloth/_auto_install.py | python - to get the optimal installation command

danielhanchen added the fixed - pending confirmation Fixed, waiting for confirmation from poster label Nov 26, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Issue] Triton Compilation Error in Unsloth Fine-Tuning Script on Kernel 5.4.0 #1336

[Issue] Triton Compilation Error in Unsloth Fine-Tuning Script on Kernel 5.4.0 #1336

gityeop commented Nov 25, 2024

chengju-zhou commented Nov 25, 2024

ergosumdre commented Nov 25, 2024

LiaoPan commented Nov 26, 2024

LiaoPan commented Nov 26, 2024

hykilpikonna commented Nov 26, 2024

LiaoPan commented Nov 26, 2024

danielhanchen commented Nov 26, 2024

danielhanchen commented Nov 26, 2024

[Issue] Triton Compilation Error in Unsloth Fine-Tuning Script on Kernel 5.4.0 #1336

[Issue] Triton Compilation Error in Unsloth Fine-Tuning Script on Kernel 5.4.0 #1336

Comments

gityeop commented Nov 25, 2024

Description

Error Message

System Information

Code

Additional Context

Steps to Reproduce

Questions

chengju-zhou commented Nov 25, 2024

ergosumdre commented Nov 25, 2024

LiaoPan commented Nov 26, 2024

LiaoPan commented Nov 26, 2024

hykilpikonna commented Nov 26, 2024

LiaoPan commented Nov 26, 2024

danielhanchen commented Nov 26, 2024

danielhanchen commented Nov 26, 2024