Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix Legion prebuild workflow (2) #1208

Merged
merged 4 commits into from
Oct 24, 2023
Merged

Fix Legion prebuild workflow (2) #1208

merged 4 commits into from
Oct 24, 2023

Conversation

goliaro
Copy link
Collaborator

@goliaro goliaro commented Oct 23, 2023

Description of changes:

Related Issues:

Linked Issues:

  • Issue #

Issues closed by this PR:

  • Closes #

This change is Reviewable

@goliaro goliaro merged commit d1da022 into inference Oct 24, 2023
46 of 47 checks passed
goliaro added a commit that referenced this pull request Nov 2, 2023
* Fix Legion prebuild workflow (2) (#1208)

* fix

* fix

* fix

* fix

* Fix Legion prebuild workflow (3) (#1210)

* fix hip error

* use CUBLAS_COMPUTE_FAST_16F for full-precision gemm

---------

Co-authored-by: Zhihao Jia <[email protected]>
goliaro added a commit that referenced this pull request Sep 4, 2024
* .

* .

* Update the default cublas behavior when CUDA_VERSION is not specified

* fix bugs in IncMHA peft_bwd kernel

* uncomment softmaxbackward

* add layernorm to align test

* add peft test scripts

* fix import

* fix

* add code to convert peft models

* add script to download peft for c++, fix bug

* fix

* add script to fine-tune models

* implement loading lora configs/weights from file

* remove peft_bwd assertion failure in embedding

* fix download script

* add peft dependencies in dockerfile

* fix softmax backward

* fix bc print indentation

* Temporarily Revert "Update the default cublas behavior when CUDA_VERSION is not specified"

This reverts commit 4ee710a.

* Fix cublas default (#1220)

* Fix Legion prebuild workflow (2) (#1208)

* fix

* fix

* fix

* fix

* Fix Legion prebuild workflow (3) (#1210)

* fix hip error

* use CUBLAS_COMPUTE_FAST_16F for full-precision gemm

---------

Co-authored-by: Zhihao Jia <[email protected]>

* fix bugs, work on align opt-lora

* update scripts

* add code to output peft tensors in hf

* update, fixes

* linting

* fix printing of tensors for numpy

* update save_inference_tensors_to_file

* linting

* update

* fix issue with save_inference_tensors_to_file

* fix layer names for save_inference_tensors_to_file

* fix peft

* fix bwd bugs

* linting

* fixes

* fix

* fix

* fix

* add bc fields for peft training

* linting

* fix

* remove ptr check

* fix

* implement save_operators for bwd

* fix bug

* implement save tensors for bwd

* .

* bug fix

* fix

* align linear

* fix

* bwd kernel updates

* undo use of CUBLAS_COMPUTE_32F_FAST_16F for now

* only send dataset entry once

* update peft test scripts

* loss

* .

* update generate/request api to take both inference and fine-tuning prompts

* linting

* alignment fixes in lora & linear layer

* alignment fix

* diagonal

* fix

* alignment fix ssm

* sigmoid-silu-multi now fully aligned

* rms norm kernel updates

* fix

* in-place residual rms

* bug fix and linting

* align backward of o_proj, attn_heads, qk_prods_softmax, and v_proj with huggingface

* cleanup

* finished all alignment fixes in attention backward kernel

* fix

* Update inc_multihead_self_attention.cu

* Update inc_multihead_self_attention.cu

* use grad to store peft in/output (#1241)

* use grad to store peft in/output

* format

* .

* format

* enable peft request

* several hacks for performance measurement; some of the changes should be reverted

* Update sigmoid_silu_multi.cu

* RoPE backward

* PEFT bug fixes and alignment (#1269)

* Revert "several hacks for performance measurement; some of the changes should be reverted"

This reverts commit b9c3926.

* backup

* backup

* updates

* update

* backup

* backup

* backup

* fix

* cleanup

* linting

* Fuse bias + relu in OPT (#1271)

* fuse bias and relu in opt

* fix

* fix

* fix

* fix

* Peft alignment & debugging tools (#1288)

* Revert "several hacks for performance measurement; some of the changes should be reverted"

This reverts commit b9c3926.

* backup

* backup

* updates

* update

* backup

* backup

* backup

* fix

* cleanup

* fix

* fix

* fix

* update

* simplify tensor names

* fix

* fixes and updates

* fixes

* fix

* cleanup

* .

* restore softmax

* cleanup

* update alignment scripts

* newline

* fix legion aliasing error

* fix warnings

* fix

* fix pipeline parallelism

* fix tp issue in combine op

* fix lora weight loading with tensor parallelism

* fixes, implement Combine::peft_bwd_task

* fix

* replicate peft bwd

* fixes

* fix

* fix combine and fwd-bwd pass dependencies

* fix replicate bwd

* fix

* let user control amount of peft memory

* only run peft_bwd if peft is enabled

* fix rms norm inference region reqs

* fix in-place fusion (part 1)

* fix inplace fusion (part 2)

* fix

* disable automatic inplace rms norm for now

* fix inf fusion inplace

* fix rest input grads for peft without inplace residuals

* fix

* fix

* fix residual rms

* fix

* fix

* enable inf debugging in fusion bwd

* hack to silence warning in fused bwd

* fix

* fix

* fix build

* fix

* fix

* add draft peft test

* Peft python interface (#1306)

* update script

* less model renaming

* fix

* fix

* fix

* backup

* .

* update

* .

* fixes

* fix

* fix build

* fix

* fix

* fix issues for downloading peft model

* solved issues for download peft model

* added printouts for debugging

* fix

* fix seg fault

* add test, separate peft script in cpp

* fix

* fixes

* fix

* update peft python interface

* update

* update

* update

* updates

* fix

* fixes

* fix

* fixes

---------

Co-authored-by: april-yyt <[email protected]>

* fix

* update

* fix

* fix to support prompts larger than max tokens per batch

* fixes to support benchmarking of finetuning throughput

* many upgrades and updates related to finetuning

* add ttft statistics

* add warmup phase

* add benchmarking code

* Add scripts for evaluation with Microsoft Azure trace (#1363)

* Add scripts for evaluation

* Add absolute request rate value

* Fix script for target arrival rate

* Fix cpp req rate benchmark

* update to use new dataset

* Fix infinite loop

* update

* add data

---------

Co-authored-by: Remi Delacourt <[email protected]>
Co-authored-by: Gabriele Oliaro <[email protected]>

* fix

* fix

* add peft tests to ci

* shellcheck

* fix

* fix python requirements

* fix

* fix

* update ci test

* update alignment doc

* fix cross entropy loss bug

* update alignment test

* update test

* add llama peft alignment test to ci

* Fix values for unused params in incr_decoding

* Add PEFTModelID NO_ID singleton instead of None

* Fix PEFTModelID::NO_ID reference

* reduce logging

* fix

* fix

* Add peft demo

* Add readme for demo

* fix alignment issue

* Peft optimizer (#1290)

* add optimizer config, only allocate weights for training

* sgd 1

* sgd 2

* update

* fix

* linting

* .

* .

* fix

* fix allreduce bug

* update

* update

* add optimizer hook in hf

* update

* update script

* .

* fix

* fwd

* bwd

* start grads

* fix gradient misalignment!

* update

* Add support for llama3

* various fixes

---------

Co-authored-by: Remi Delacourt <[email protected]>

* Optimizers python interface (#1441)

* python interface for optimizer

* update lora linear config to support python interface

* update python interface

* finished lora python interface

* fix

* fix

* update

* update

* more fixes

* fix

* initialize lora weights where needed

* Add notebook

* Update demo to use dataset

* Fix'

* Save weights after end of finetuning (#1446)

* support accumulation of gradients without update

* add code to save peft weights

* fix

* save configs

* cleanup

* Fully use notebook for demo

* Parameterize generation and finetuning configs

* Comment out inference for now

* fix bug in lora inference only mode

* fix

* Add finetuning or inference only flags

* fix

* fix

* fix

* PEFT model upload (#1450)

* upload test

* fix

* Make demo_class.py executable

* fix

* add base_model_name_or_path

* fix

* fix

* support llama-3 tokenizer

* print output tokens when not benchmarking

* Use Llama3 in demo_class

* Use Llama3 in demo

* fix data loading for llama-3

* Add download models to demo

* return/print loss at each finetuning step

* fix

* Adjust demo parameters

* Fix for finetuning

* pass finetuning losses to python interface

* Update demo

* Fix upload

* Refactor demo

* rename demo_class to demo

* fix

* remove epoch from loss print

* Finish demo

* fix test

* rocm fixes

* more rocm fixes

* fix rocm build

* docker fix

* fix inference test

* fix workflow

* fix makefile

* fix peft test

* fix all-reduce issue with lora for TP scenario

* fix bwd lm head

* fixes

* more fixes

* update

* fix alignment up to input ln

* finished aligning all backward (tp>1)

* align all peft

* fix

* fix broken link

* formatting

* fix

* update

* Revert "update"

This reverts commit 90b2c87.

* update

* fix hip build

* fix gpu ci

* fix gpu ci

* update default gpu ci version to 12.0

* update ci to 12.0

* fix

* fix

* update

* fix

* fix

* update

* fix

* add cleanup

* downgrade to cuda=11.8

---------

Co-authored-by: Gabriele Oliaro <[email protected]>
Co-authored-by: xinhaoc <[email protected]>
Co-authored-by: Xinhao Cheng <[email protected]>
Co-authored-by: april-yyt <[email protected]>
Co-authored-by: Remi <[email protected]>
Co-authored-by: Remi Delacourt <[email protected]>
Co-authored-by: Rémi Delacourt <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant