Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

hardcode to use slow tokenizer && update the performance on examples dataset #11

Merged
merged 5 commits into from
Oct 3, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 3 additions & 0 deletions aria/model/processing_aria.py
Original file line number Diff line number Diff line change
Expand Up @@ -228,9 +228,12 @@ def from_pretrained(
image_processor_path,
**cls._extract_kwargs(AriaVisionProcessor.from_pretrained, **kwargs),
)
if "use_fast" in kwargs:
kwargs.pop("use_fast")
try:
tokenizer = AutoTokenizer.from_pretrained(
tokenizer_path,
use_fast=False,
**cls._extract_kwargs(AutoTokenizer.from_pretrained, **kwargs),
)
chat_template = tokenizer.chat_template
Expand Down
Binary file modified assets/nextqa_loss_lora.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file modified assets/nlvr2_loss_490_lora.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file modified assets/nlvr2_loss_980_lora.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file modified assets/refcoco_loss_lora.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
8 changes: 4 additions & 4 deletions examples/nextqa/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -14,13 +14,13 @@ unzip NExTVideo.zip
# Training Configuration and Commands

## LoRA
The LoRA training configuration is shown in [config_lora.yaml](../../examples/nextqa/config_lora.yaml). Please modify your customized path of Aria model, Aria tokenizer and the nlvr2 dataset. This setting can run well on A100 80GB using 4k sequence length due to longer visual context. We set the `max_image_size` to 490 for video datasets.
The LoRA training configuration is shown in [config_lora.yaml](../../examples/nextqa/config_lora.yaml). Please modify your customized path of Aria model, Aria tokenizer and the nlvr2 dataset. This setting can run well on single A100 80GB using 4k sequence length due to longer visual context. We set the `max_image_size` to 490 for video datasets.

> *Note:* In this configuration, we add LoRA on all modules in the LLM of Aria, without the vit and projector. If you want to add LoRA on vit/projector, you can adjust the `freeze_vit` or `freeze_projector`. You can also adjust `lora_target_modules` to choose the sub-modules of LLM blocks and `freeze_llm_layers` to set the layers where you don't want to add LoRA.

Command (on two 80GB A100s):
Command (on single 80GB A100):
```bash
accelerate launch --config_file recipes/accelerate_configs/zero2.yaml --num_processes 2 aria/train.py --config examples/nextqa/config_lora.yaml --output_dir [YOUR_OUT_DIR]
CUDA_VISIBLE_DEVICES=0 python aria/train.py --config examples/nextqa/config_lora.yaml --output_dir [YOUR_OUT_DIR]
```

## Full Params
Expand All @@ -43,7 +43,7 @@ CUDA_VISIBLE_DEVICES=0 python examples/nextqa/evaluation.py \
The `Accuracy`:
| Aria | LoRA SFT | Full Params SFT |
|:-------------------------------------:|:-------------------------:|:-------:|
|76.02 | 79.08 | 81.42 |
|78.14 | 80.80 | 81.42 |

## Loss Curve
These are the loss curves of `LoRA SFT` and `Full Params SFT`:
Expand Down
3 changes: 2 additions & 1 deletion examples/nextqa/evaluation.py
Original file line number Diff line number Diff line change
Expand Up @@ -69,6 +69,7 @@ def load_model_and_tokenizer(args):
model = AriaForConditionalGeneration.from_pretrained(
args.base_model_path, device_map="auto", torch_dtype=torch.bfloat16
).eval()
model.pad_token_id = tokenizer.pad_token_id

if args.peft_model_path:
peft_config = PeftConfig.from_pretrained(args.peft_model_path)
Expand Down Expand Up @@ -134,7 +135,7 @@ def collate_fn(batch, processor, tokenizer):
padding="longest",
max_image_size=args.image_size,
)
return inputs, batch, messages
return inputs, batch, texts


def main():
Expand Down
12 changes: 6 additions & 6 deletions examples/nlvr2/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -18,18 +18,18 @@ unzip train.part3.zip
# Training Configuration and Commands

## LoRA
The LoRA training configuration is shown in [config_lora.yaml](../../examples/nlvr2/config_lora.yaml). Please modify your customized path of Aria model, Aria tokenizer and the nlvr2 dataset. This setting can run well on A100s (80GB) with 2k input sequence length. You can specify the `max_image_size` (e.g., 980 or 490) in the command line.
The LoRA training configuration is shown in [config_lora.yaml](../../examples/nlvr2/config_lora.yaml). Please modify your customized path of Aria model, Aria tokenizer and the nlvr2 dataset. This setting can run well on single A100 (80GB) with 2k input sequence length. You can specify the `max_image_size` (e.g., 980 or 490) in the command line.

> *Note:* In this configuration, we add LoRA on all modules in the LLM of Aria, without the vit and projector. If you want to add LoRA on vit/projector, you can adjust the `freeze_vit` or `freeze_projector`. You can also adjust `lora_target_modules` to choose the sub-modules of LLM blocks and `freeze_llm_layers` to set the layers where you don't want to add LoRA.

Command (on two 80GB A100s):
Command (on single 80GB A100):
```bash
accelerate launch --config_file recipes/accelerate_configs/zero2.yaml --num_processes 2 aria/train.py --config examples/nlvr2/config_lora.yaml --max_image_size 980 --output_dir [YOUR_OUT_DIR]
CUDA_VISIBLE_DEVICES=0 python aria/train.py --config examples/nlvr2/config_lora.yaml --max_image_size 980 --output_dir [YOUR_OUT_DIR]
```

You can change the `max_image_size` to 490:
```bash
accelerate launch --config_file recipes/accelerate_configs/zero2.yaml --num_processes 2 aria/train.py --config examples/nlvr2/config_lora.yaml --max_image_size 490 --output_dir [YOUR_OUT_DIR]
CUDA_VISIBLE_DEVICES=0 python aria/train.py --config examples/nlvr2/config_lora.yaml --max_image_size 490 --output_dir [YOUR_OUT_DIR]
```

## Full Params
Expand Down Expand Up @@ -57,8 +57,8 @@ CUDA_VISIBLE_DEVICES=0 python examples/nlvr2/evaluation.py \
The `Accuracy`:
| | Aria | LoRA SFT | Full Params SFT |
|:--------:|:-------------------------------------:|:-------------------------:|:-------:|
|490 |88.09 | 91.27 | 92.24 |
|980 |88.08 | 91.50 | 92.33 |
|490 |86.56 | 91.32 | 92.24 |
|980 |87.03 | 91.61 | 92.33 |

# Loss Curve
These are the loss curves of `LoRA Finetuning` (left) and `Full Params Finetuning` (right) with 490 and 980 `max_image_size`:
Expand Down
1 change: 1 addition & 0 deletions examples/nlvr2/evaluation.py
Original file line number Diff line number Diff line change
Expand Up @@ -69,6 +69,7 @@ def load_model_and_tokenizer(args):
model = AriaForConditionalGeneration.from_pretrained(
args.base_model_path, device_map="auto", torch_dtype=torch.bfloat16
).eval()
model.pad_token_id = tokenizer.pad_token_id

if args.peft_model_path:
peft_config = PeftConfig.from_pretrained(args.peft_model_path)
Expand Down
8 changes: 4 additions & 4 deletions examples/refcoco/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -13,13 +13,13 @@ unzip images.zip
# Training Configuration and Commands

## LoRA
The LoRA training configuration is shown in [config_lora.yaml](../../examples/refcoco/config_lora.yaml). Please modify your customized path of Aria model, Aria tokenizer and the refcoco dataset. This setting can run well on A100s (80GB) with 2k input sequence length. `max_image_size` is set to **980**.
The LoRA training configuration is shown in [config_lora.yaml](../../examples/refcoco/config_lora.yaml). Please modify your customized path of Aria model, Aria tokenizer and the refcoco dataset. This setting can run well on single A100 (80GB) with 2k input sequence length. `max_image_size` is set to **980**.

> *Note:* In this configuration, we add LoRA on all modules in the LLM of Aria, without the vit and projector. If you want to add LoRA on vit/projector, you can adjust the `freeze_vit` or `freeze_projector`. You can also adjust `lora_target_modules` to choose the sub-modules of LLM blocks and `freeze_llm_layers` to set the layers where you don't want to add LoRA.

Command (on two 80GB A100s):
Command (on single 80GB A100):
```bash
accelerate launch --config_file recipes/accelerate_configs/zero2.yaml --num_processes 2 aria/train.py --config examples/refcoco/config_lora.yaml --output_dir [YOUR_OUT_DIR]
CUDA_VISIBLE_DEVICES=0 python aria/train.py --config examples/refcoco/config_lora.yaml --output_dir [YOUR_OUT_DIR]
```

## Full Params
Expand Down Expand Up @@ -58,7 +58,7 @@ CUDA_VISIBLE_DEVICES=0 python examples/refcoco/evaluation.py \
The `Precision@1`:
| Aria | LoRA SFT | Full Params SFT |
|:-------------------------------------:|:-------------------------:|:-------:|
|41.77 | 88.92 | 88.85 |
|2.27 | 88.68 | 88.85 |

# Loss Curve
These are the loss curves of `LoRA Finetuning` (left) and `Full Params Finetuning` (right):
Expand Down
3 changes: 2 additions & 1 deletion examples/refcoco/evaluation.py
Original file line number Diff line number Diff line change
Expand Up @@ -71,6 +71,7 @@ def load_model_and_tokenizer(args):
model = AriaForConditionalGeneration.from_pretrained(
args.base_model_path, device_map="auto", torch_dtype=torch.bfloat16
).eval()
model.pad_token_id = tokenizer.pad_token_id

if args.peft_model_path:
peft_config = PeftConfig.from_pretrained(args.peft_model_path)
Expand Down Expand Up @@ -128,7 +129,7 @@ def collate_fn(batch, processor, tokenizer):
padding="longest",
max_image_size=args.image_size,
)
return inputs, batch, messages
return inputs, batch, texts


def main():
Expand Down
Loading