Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Issue training with Qwen 2.5 7B #1333

Open
Vital1162 opened this issue Nov 24, 2024 · 4 comments
Open

Issue training with Qwen 2.5 7B #1333

Vital1162 opened this issue Nov 24, 2024 · 4 comments

Comments

@Vital1162
Copy link

from trl import SFTTrainer
from transformers import TrainingArguments, DataCollatorForSeq2Seq
from unsloth import is_bfloat16_supported

trainer = SFTTrainer(
    model = model,
    tokenizer = tokenizer,
    train_dataset = dataset,
    dataset_text_field = "text",
    max_seq_length = max_seq_length,
    data_collator = DataCollatorForSeq2Seq(tokenizer = tokenizer),
    dataset_num_proc = 2,
    packing = False, # Can make training 5x faster for short sequences.
    args = TrainingArguments(
        per_device_train_batch_size = 2,
        gradient_accumulation_steps = 4,
        warmup_steps = 5,
        num_train_epochs = 1, # Set this for 1 full training run.
        # max_steps = 60,
        learning_rate = 2e-4,
        fp16 = not is_bfloat16_supported(),
        bf16 = is_bfloat16_supported(),
        logging_steps = 1,
        optim = "adamw_8bit",
        weight_decay = 0.01,
        lr_scheduler_type = "linear",
        seed = 3407,
        output_dir = "outputs",
        report_to = "none", # Use this for WandB etc
    ),
)
from unsloth.chat_templates import train_on_responses_only
trainer = train_on_responses_only(
    trainer,
    instruction_part = "<|im_start|>user\n",
    response_part = "<|im_start|>assistant\n",
)

trainer_stats = trainer.train()
==((====))==  Unsloth - 2x faster free finetuning | Num GPUs = 1
   \\   /|    Num examples = 3,902 | Num Epochs = 1
O^O/ \_/ \    Batch size per device = 2 | Gradient Accumulation steps = 4
\        /    Total batch size = 8 | Total steps = 487
 "-____-"     Number of trainable parameters = 647,313,408
---------------------------------------------------------------------------
AssertionError                            Traceback (most recent call last)
[/usr/local/lib/python3.10/dist-packages/triton/language/core.py](https://localhost:8080/#) in wrapper(*args, **kwargs)
     34                              "(`_builder` argument must be provided outside of JIT functions.)")
---> 35         return fn(*args, **kwargs)
     36 

34 frames
AssertionError: First input (fp32) and second input (fp16) must have the same dtype!

The above exception was the direct cause of the following exception:

CompilationError                          Traceback (most recent call last)
[/usr/local/lib/python3.10/dist-packages/triton/compiler/compiler.py](https://localhost:8080/#) in make_ir(self, options, codegen_fns, context)
    111 
    112     def make_ir(self, options, codegen_fns, context):
--> 113         return ast_to_ttir(self.fn, self, context=context, options=options, codegen_fns=codegen_fns)
    114 
    115     def parse_options(self):

CompilationError: at 60:16:

    accum = tl.zeros((BLOCK_B, BLOCK_V), dtype=tl.float32)
    for d in range(0, tl.cdiv(D, BLOCK_D)):
        # Load the next block of A and B, generate a mask by checking the K dimension.
        # If it is out of bounds, set it to 0.
        if EVEN_D:
            e = tl.load(e_ptrs)
            c = tl.load(c_ptrs)
        else:
            e = tl.load(e_ptrs, mask=offs_d[None, :] < D - d * BLOCK_D, other=0.0)
            c = tl.load(c_ptrs, mask=offs_d[:, None] < D - d * BLOCK_D, other=0.0)
        accum = tl.dot(e, c, accum, input_precision=DOT_PRECISION)
@AleNunezArroyo
Copy link

Same here with the continued pretraining notebook using llama-3.

@bharris47
Copy link

Downgrading triton to triton==2.3.1 helped me here.

@khoi03
Copy link

khoi03 commented Nov 25, 2024

I'm encountering the same problem with the continued pretraining notebook using llama-3 and qwen 2.5.

@danielhanchen
Copy link
Contributor

Apologies everyone - I added a new library from Apple which reduces memory usage for cross entropy - I might have to disable it instead it seems, and allow it as a switch

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants