Issue training with Qwen 2.5 7B #1333

Vital1162 · 2024-11-24T17:26:29Z

from trl import SFTTrainer
from transformers import TrainingArguments, DataCollatorForSeq2Seq
from unsloth import is_bfloat16_supported

trainer = SFTTrainer(
    model = model,
    tokenizer = tokenizer,
    train_dataset = dataset,
    dataset_text_field = "text",
    max_seq_length = max_seq_length,
    data_collator = DataCollatorForSeq2Seq(tokenizer = tokenizer),
    dataset_num_proc = 2,
    packing = False, # Can make training 5x faster for short sequences.
    args = TrainingArguments(
        per_device_train_batch_size = 2,
        gradient_accumulation_steps = 4,
        warmup_steps = 5,
        num_train_epochs = 1, # Set this for 1 full training run.
        # max_steps = 60,
        learning_rate = 2e-4,
        fp16 = not is_bfloat16_supported(),
        bf16 = is_bfloat16_supported(),
        logging_steps = 1,
        optim = "adamw_8bit",
        weight_decay = 0.01,
        lr_scheduler_type = "linear",
        seed = 3407,
        output_dir = "outputs",
        report_to = "none", # Use this for WandB etc
    ),
)
from unsloth.chat_templates import train_on_responses_only
trainer = train_on_responses_only(
    trainer,
    instruction_part = "<|im_start|>user\n",
    response_part = "<|im_start|>assistant\n",
)

trainer_stats = trainer.train()

==((====))==  Unsloth - 2x faster free finetuning | Num GPUs = 1
   \\   /|    Num examples = 3,902 | Num Epochs = 1
O^O/ \_/ \    Batch size per device = 2 | Gradient Accumulation steps = 4
\        /    Total batch size = 8 | Total steps = 487
 "-____-"     Number of trainable parameters = 647,313,408
---------------------------------------------------------------------------
AssertionError                            Traceback (most recent call last)
[/usr/local/lib/python3.10/dist-packages/triton/language/core.py](https://localhost:8080/#) in wrapper(*args, **kwargs)
     34                              "(`_builder` argument must be provided outside of JIT functions.)")
---> 35         return fn(*args, **kwargs)
     36 

34 frames
AssertionError: First input (fp32) and second input (fp16) must have the same dtype!

The above exception was the direct cause of the following exception:

CompilationError                          Traceback (most recent call last)
[/usr/local/lib/python3.10/dist-packages/triton/compiler/compiler.py](https://localhost:8080/#) in make_ir(self, options, codegen_fns, context)
    111 
    112     def make_ir(self, options, codegen_fns, context):
--> 113         return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns)
    114 
    115     def parse_options(self):

CompilationError: at 60:16:

    accum = tl.zeros((BLOCK_B, BLOCK_V), dtype=tl.float32)
    for d in range(0, tl.cdiv(D, BLOCK_D)):
        # Load the next block of A and B, generate a mask by checking the K dimension.
        # If it is out of bounds, set it to 0.
        if EVEN_D:
            e = tl.load(e_ptrs)
            c = tl.load(c_ptrs)
        else:
            e = tl.load(e_ptrs, mask=offs_d[None, :] < D - d * BLOCK_D, other=0.0)
            c = tl.load(c_ptrs, mask=offs_d[:, None] < D - d * BLOCK_D, other=0.0)
        accum = tl.dot(e, c, accum, input_precision=DOT_PRECISION)

The text was updated successfully, but these errors were encountered:

AleNunezArroyo · 2024-11-24T18:02:31Z

Same here with the continued pretraining notebook using llama-3.

bharris47 · 2024-11-24T22:09:58Z

Downgrading triton to triton==2.3.1 helped me here.

khoi03 · 2024-11-25T06:24:01Z

I'm encountering the same problem with the continued pretraining notebook using llama-3 and qwen 2.5.

danielhanchen · 2024-11-25T11:00:27Z

Apologies everyone - I added a new library from Apple which reduces memory usage for cross entropy - I might have to disable it instead it seems, and allow it as a switch

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Issue training with Qwen 2.5 7B #1333

Issue training with Qwen 2.5 7B #1333

Vital1162 commented Nov 24, 2024

AleNunezArroyo commented Nov 24, 2024

bharris47 commented Nov 24, 2024

khoi03 commented Nov 25, 2024

danielhanchen commented Nov 25, 2024

Issue training with Qwen 2.5 7B #1333

Issue training with Qwen 2.5 7B #1333

Comments

Vital1162 commented Nov 24, 2024

AleNunezArroyo commented Nov 24, 2024

bharris47 commented Nov 24, 2024

khoi03 commented Nov 25, 2024

danielhanchen commented Nov 25, 2024