-
-
Notifications
You must be signed in to change notification settings - Fork 5.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Roberta embedding #7969
Closed
Closed
Roberta embedding #7969
Changes from all commits
Commits
Show all changes
639 commits
Select commit
Hold shift + click to select a range
919bf88
BART e2e test runs but does not pass
afeldman-nm 753bab0
Merge branch 'main' into infra_enc_dec_model_runner_reviews
afeldman-nm 125e5dc
Merge branch 'main' into infra_enc_dec_cross_attn_reviews
afeldman-nm 597526a
removed extra line
afeldman-nm a178b7a
changed nested if/else to elif/else in xformers mask computation code
afeldman-nm 06c7f75
reorganized helper functions that were only being used for testing in…
afeldman-nm 47c9f39
removed attention_type
afeldman-nm 2f0b05b
typing and formatting
afeldman-nm d23c284
typing and formatting; fixed escape sequences in comments
afeldman-nm 1a6e5a3
moved make_tensor_with_pad() helper function back to vllm.utils
afeldman-nm e2a46e3
formatting
afeldman-nm d43141f
merge; a lot of formatting fixes to bart code but not fully passing
afeldman-nm 5169a2a
removed unnecessary positions arguments from BART encoder, decoder fo…
afeldman-nm 4400d77
some reformatting
afeldman-nm e61385d
fixed bug caused by overzealous refactoring
afeldman-nm 41e31e8
BART with new explanatory comments & passing formatting tests
afeldman-nm ba4e2c1
Removed unnecessary position arguments from BART routine; formatting
afeldman-nm 4dabe19
Merge branch 'main' into infra_enc_dec_cross_attn_reviews
afeldman-nm a5c28fc
Merge branch 'infra_enc_dec_cross_attn' into infra_enc_dec_model_runn…
afeldman-nm 7ca0d7a
Merge branch 'main' into infra_enc_dec_cross_attn_reviews
afeldman-nm c24697f
Merge branch 'main' into infra_enc_dec_cross_attn_reviews
afeldman-nm 75756b9
removed redundant elif
afeldman-nm bcccc34
Merge branch 'main' into infra_enc_dec_cross_attn_reviews
afeldman-nm c8f8d59
Merge branch 'main' into infra_enc_dec_cross_attn_reviews
afeldman-nm a501849
reverted unnecessarily vllm/utils.py changes
afeldman-nm 83d474e
Merge branch 'main' into infra_enc_dec_cross_attn_reviews
afeldman-nm 64981b5
Merge branch 'main' into infra_enc_dec_cross_attn_reviews
afeldman-nm 8d36458
Merge branch 'main' into infra_enc_dec_cross_attn_reviews
afeldman-nm 5ff9c76
Merge branch 'main' into infra_enc_dec_cross_attn_reviews
afeldman-nm 2828aa7
Merge branch 'main' into infra_enc_dec_cross_attn_reviews
afeldman-nm 65e47db
Merge branch 'main' into infra_enc_dec_cross_attn_reviews
afeldman-nm 44c6270
manually merged BART code in from previous modelrunner attempt, it wo…
afeldman-nm b085795
Merge branch 'infra_enc_dec_cross_attn' into infra_enc_dec_model_runner2
afeldman-nm ba09fbc
refactored where a number of constants are stored, primarily constant…
afeldman-nm 2f0eb9b
Merge branch 'main' into infra_enc_dec_cross_attn_reviews
afeldman-nm d81662c
Merge branch 'main' into infra_enc_dec_cross_attn_reviews
afeldman-nm 22d013c
Merge branch 'infra_enc_dec_cross_attn' into infra_enc_dec_model_runner2
afeldman-nm 13f5b50
Merge branch 'main' into infra_enc_dec_cross_attn_reviews
afeldman-nm 5dbebbc
Update vllm/attention/backends/torch_sdpa.py
afeldman-nm 07df0e1
Update vllm/attention/layer.py
afeldman-nm 7e0bc57
Merge branch 'main' into infra_enc_dec_cross_attn_reviews
afeldman-nm e837a73
Merge branch 'infra_enc_dec_cross_attn_reviews' into infra_enc_dec_cr…
afeldman-nm 7ce9a51
merged in first pieces of woosuk feedback & latest main; formatting
afeldman-nm 9ae6728
fixed specific point-changes requested by woosuk
afeldman-nm a1bf652
test_encoder_decoder_attn.py cleanup
afeldman-nm 4f27946
tests/kernels/utils.py cleanup
afeldman-nm 5ee30fe
vllm/attention/backends/abstract.py cleanup
afeldman-nm 45fc9f7
vllm/attention/backends/blocksparse_attn.py cleanup
afeldman-nm 097aff2
vllm/attention/backends/flash_attn.py cleanup
afeldman-nm d8a692b
cleaning up a number of backends & backends utils.py
afeldman-nm 5df73fc
xformers backend cleanup
afeldman-nm 6cd595c
formatting
afeldman-nm db49d48
Merge branch 'infra_enc_dec_cross_attn' into infra_enc_dec_model_runner2
afeldman-nm 88e284a
merge from main
afeldman-nm c90140f
Merge branch 'main' into infra_enc_dec_model_runner2
afeldman-nm bd14d29
wip scheduler
afeldman-nm 2c80185
formatting
afeldman-nm 4c01f13
Merge branch 'main' into infra_enc_dec_model_runner_reviews
afeldman-nm c95adf5
scheduler supports encoder-/cross-attention & passes existing schedul…
afeldman-nm d1343aa
scheduler test passes
afeldman-nm b4a461d
formatting
afeldman-nm 6a71f8f
formatting
afeldman-nm fe7786c
Merge remote-tracking branch 'bert_deps/afeldman-nm/infra_enc_dec_mod…
laishzh 9a63f51
wip model runner
afeldman-nm f649944
Merge branch 'main' into infra_enc_dec_model_runner
afeldman-nm 685604c
wip modelrunner
afeldman-nm 9c898f5
Merge branch 'main' into infra_enc_dec_model_runner_reviews
afeldman-nm 196f30c
enc/dec decoder test working, sans sampling check
afeldman-nm c5ceb23
Merge branch 'main' into infra_enc_dec_model_runner_reviews
afeldman-nm 9ce2da4
Merge branch 'main' into infra_enc_dec_model_runner_reviews
afeldman-nm 447a5c7
Merge branch 'main' into infra_enc_dec_model_runner_reviews
afeldman-nm 3d5bb88
EncoderDecoderModelInput correctly handles encoder token/position fields
afeldman-nm db5539a
format
afeldman-nm 760355b
bart test skipped on CPU version of vllm
afeldman-nm 590a240
Formatting
afeldman-nm 8b8d981
refactored AttentionType and related imports; skip BART test definiti…
afeldman-nm ff940f7
formatting
afeldman-nm 64d7198
wip
afeldman-nm 0cca164
Merge branch 'main' into infra_enc_dec_model_runner_reviews
afeldman-nm 94c083c
Merge branch 'infra_enc_dec_model_runner_reviews' into infra_enc_dec_…
afeldman-nm 83c5c43
prompt type checks
afeldman-nm 10ed714
Format
afeldman-nm 78d3d3c
modified LLM.generate() error message
afeldman-nm 6c95380
wip engine is_encoder_decoder() setting
afeldman-nm 304caed
formatting
afeldman-nm 7b0803b
formatting?
afeldman-nm 5525511
Sequence may be constructed with encoder/decoder LLMInput configurations
afeldman-nm dd4031c
wip but having wllm.commit_id error
afeldman-nm 8dccaa5
correctly constructing enc/dec sequences
afeldman-nm 336a77d
formatting
afeldman-nm 46397c7
wip
afeldman-nm f85997b
Merge branch 'main' into infra_enc_dec_model_runner_reviews
afeldman-nm 251f899
wip
afeldman-nm 9141347
Merge branch 'main' into infra_enc_dec_model_runner_reviews
afeldman-nm ddaf0ad
wip
afeldman-nm 54ff142
Merge branch 'main' into infra_enc_dec_model_runner_reviews
afeldman-nm 92d9f48
conftest: encoder/decoder example prompts
afeldman-nm c5846ac
Hfrunner greedy logprobs limit
afeldman-nm 374880f
input preparation now includes encoder-oriented input setup:
afeldman-nm 796d7a3
Merge branch 'main' into infra_enc_dec_model_runner_reviews
afeldman-nm 42ac66b
VllmRunner encoder/decoder methods
afeldman-nm 850a97e
bart parallel vocab
afeldman-nm 3c7e19d
zip enc/dec prompts; formatting
afeldman-nm e534ffc
wip
afeldman-nm 97d81f0
encoder/decoder input processing; formatting
afeldman-nm 87ed3b6
Merge branch 'main' into infra_enc_dec_model_runner_reviews
afeldman-nm 713d095
incorporated encoder sequence into request-add functionality
afeldman-nm aea8d34
Merge branch 'main' into infra_enc_dec_model_runner_reviews
afeldman-nm 159c7bc
fixed decoder-only bug
afeldman-nm 16c9aa2
bugfix
afeldman-nm 03aea18
wip
afeldman-nm ef80c85
wip
afeldman-nm f8dd4a5
fixed scheduler bug
afeldman-nm c2ff615
format
afeldman-nm 31127fa
Merge branch 'main' into infra_enc_dec_model_runner_reviews
afeldman-nm 1c6e06d
bugfix
afeldman-nm 0cc14ab
Merge branch 'main' into infra_enc_dec_model_runner_reviews
afeldman-nm 3656dc6
Merge branch 'main' into infra_enc_dec_model_runner_reviews
afeldman-nm aee5f16
fixed sequence bug
afeldman-nm ef94623
added examples utils w/ context manager for backend override; applied…
afeldman-nm 50ad5ff
Merge branch 'main' into infra_enc_dec_model_runner_reviews
afeldman-nm b277180
formatting
afeldman-nm cac6283
added encoder/decoder example to examples test
afeldman-nm f54f276
wip refactoring
afeldman-nm 597a07d
refactor
afeldman-nm 9f5a02c
RequestOutput & SequenceGroup now include encoder prompt in output, a…
afeldman-nm 94c904f
wip parallel bart but encountering GPU count issue
afeldman-nm 9da8fb3
Merge branch 'main' into infra_enc_dec_model_runner_reviews
afeldman-nm 1f8c52f
tweaks to enc/dec example
afeldman-nm 1808846
formatting
afeldman-nm f15eacf
wip
afeldman-nm 6c940f8
modified HF behavior in BART test to be truly greedy
afeldman-nm 949ac02
formatting
afeldman-nm 88c058e
wip parallelizing BART
afeldman-nm 31e335f
wip activation parallelization
afeldman-nm c092ed4
merged in upstream changes; left some formatting issues which I expec…
afeldman-nm d7bd617
Merge branch 'infra_enc_dec_model_runner' into infra_enc_dec_model_ru…
afeldman-nm 69f0379
wip:
afeldman-nm 9fdd047
Merge branch 'main' into infra_enc_dec_model_runner_reviews
afeldman-nm 584c01e
Merge branch 'infra_enc_dec_model_runner_reviews' into infra_enc_dec_…
afeldman-nm 7ace684
Merge remote-tracking branch 'bert_deps/afeldman-nm/infra_enc_dec_mod…
laishzh 41ccf0c
wip merge
afeldman-nm ffa99b2
additional merge
afeldman-nm a22f56c
Merge branch 'main' into infra_enc_dec_model_runner_reviews
afeldman-nm c00e0a8
CommonMetadataBuilder sets block_tables constructor arg of metadata
afeldman-nm 32967c1
Merge branch 'main' into infra_enc_dec_model_runner_reviews
afeldman-nm a33b501
Merge branch 'infra_enc_dec_model_runner' into infra_enc_dec_model_ru…
afeldman-nm a16cabb
equalized some generation/sampling config settings between enc/dec HF…
afeldman-nm abbb427
Merge branch 'infra_enc_dec_model_runner' into infra_enc_dec_model_ru…
afeldman-nm 00198a6
BART MLPs parallelized
afeldman-nm fb3227f
parallelized BART learned positional embedding
afeldman-nm e5bb9de
all attention layer output linears are parallelized
afeldman-nm 74abe22
encoder attention & decoder self-attention parallelized
afeldman-nm 9bbed43
parallelized LM head
afeldman-nm fdf71de
parallelized enc/dec cross-attention, using a slight hack
afeldman-nm 3551b6b
fixed bug where underlying Attention was constructed using full head-…
afeldman-nm b174c7a
bart is parallelized, modulo an unfortunate hack for QKVParallelLinea…
afeldman-nm c43a6ed
commented out BART TP=4
afeldman-nm a408289
Merge remote-tracking branch 'bert_deps/afeldman-nm/infra_enc_dec_mod…
laishzh b90b6b6
upstream merge
afeldman-nm 14831b0
Merge branch 'infra_enc_dec_model_runner_reviews' into infra_enc_dec_…
afeldman-nm 427032a
Merge branch 'main' into infra_enc_dec_model_runner_reviews
afeldman-nm c51a168
fixed bug in how conftest was handling HF encoder/decoder outputs; di…
afeldman-nm b01937f
set up None/empty str tests which are not passing
afeldman-nm 48a742d
Merge branch 'main' into infra_enc_dec_model_runner_reviews
afeldman-nm b283544
Merge branch 'infra_enc_dec_model_runner_correctness' into infra_enc_…
afeldman-nm 059273f
wip
afeldman-nm 229847b
Merge branch 'main' into infra_enc_dec_model_runner_reviews
afeldman-nm 7e7bbd9
deleted unnecessary dependency
afeldman-nm 4a6e39e
Merge branch 'main' into infra_enc_dec_model_runner_reviews
afeldman-nm aa01d71
empty-string decoder input is now handled for encoder/decoder
afeldman-nm 0b29fd2
enc/dec handles empty str and None decoder prompts correctly
afeldman-nm dd784b5
typing fix
afeldman-nm 61d2ad2
fixed bugs in handling non-text formats for individual prompts
afeldman-nm f36ffb5
example includes prompt zipper
afeldman-nm c493d40
Merge branch 'main' into infra_enc_dec_model_runner_reviews
afeldman-nm be58d8a
Merge branch 'main' into infra_enc_dec_model_runner_reviews
afeldman-nm 02114bd
_free_seq_group() -> _free_seq_group_cross_attn_blocks()
afeldman-nm 5a270ff
refactoring
afeldman-nm ed4a56b
formatting
afeldman-nm 4b5b2cf
removed unnecessary argument reordering
afeldman-nm d82b273
enc/dec example comments'
afeldman-nm 0af58ec
responses to feedback
afeldman-nm bed9bcd
Merge branch 'main' into infra_enc_dec_model_runner_reviews
afeldman-nm 47b4eb2
fixed bug caused by upstream refactoring
afeldman-nm 393515e
formatting
afeldman-nm fb5a2bc
upstream merge
afeldman-nm c2cc010
Removed lora from enc/dec model runner
afeldman-nm 175ea95
Merge branch 'main' into infra_enc_dec_model_runner_reviews
afeldman-nm 3327e5b
removed lora & vision & mm code from enc/dec modelrunner
afeldman-nm 47c5548
checked out examples/offline_inference.py from main
afeldman-nm 1bb7ad9
updated RequestOutput docstring
afeldman-nm 035d90d
updated RequestOutput docstring
afeldman-nm 64685ac
Sequence docstring
afeldman-nm d1751db
removed flashinfer references from enc/dec modelrunner
afeldman-nm f0abcc2
format
afeldman-nm 4bb7fc4
removed chunked prefill logic/docstring text from enc/dec modelrunner
afeldman-nm a936faa
removed prefix caching from enc/dec modelrunner
afeldman-nm 59bf8c4
Merge remote-tracking branch 'bert_deps/afeldman-nm/infra_enc_dec_mod…
laishzh 12a9869
Merge remote-tracking branch 'origin/main'
laishzh 53c5148
(WIP)feat: EmbeddingModelRunner support encoder model
laishzh 63fb7a5
WIP: bert embedding
laishzh 37bcba0
feat: full pipeline
laishzh 76b47fb
chore: recover
laishzh aca786e
feat: default bos_token_id of encoder model
laishzh 682c455
feat: recover sequence
laishzh 872e795
feat: embedding model forward
laishzh a0ad0df
chore: recover unchanged files
laishzh f215884
chore: recover
laishzh 7657af3
feat: fix lint
laishzh 91e23d8
feat: fix lint
laishzh 0b3f55c
feat: fix lint
laishzh 275f49d
feat: embedding model prompt
laishzh ce9a599
feat: bos_token_id
laishzh 7e1196d
fix: fix hint
laishzh b99d783
feat: remove embedding block space manager
laishzh b76da51
feat: enc_dec_runner base
laishzh e15d0cc
Merge branch 'main' into main
laishzh 8b107a2
feat: fix lint
laishzh bfd7ec9
feat: model input
laishzh 6f006f5
chore: fix lint
laishzh 37f698b
feat: move BertEmbeddingModel to the end of file
laishzh d098607
feat: remove embedding_model_block_manager.py
laishzh fc1f2b7
chore: fix lint
laishzh 612cf1a
feat: modify test_embedding
laishzh 7d0ecb9
Add support for Roberta embedding models
maxdebayser e351bfd
feat: bert embedding implemented, but still have some bugs with mistral,
laishzh 3ff2d36
feat: some changes on test_embedding.py
laishzh 776dcbd
Merge branch 'main' of https://github.com/vllm-project/vllm
laishzh 0ea4da1
feat: fix lint
laishzh 15be7fa
feat: fix lint
laishzh afd997b
Merge branch '5447' into roberta_embedding
maxdebayser 464a90f
Merge branch 'main' into bert
maxdebayser 30c875e
Merge branch 'bert' into roberta_embedding
maxdebayser 2c8a5b9
Merge branch 'main' into bert
maxdebayser 08f1781
add head size 32
maxdebayser 3fbfdf4
Merge remote-tracking branch 'origin/main'
laishzh 57bdd60
Merge branch 'upstream_main' into bert
maxdebayser a14b4e3
Merge branch 'bert' into roberta_embedding
maxdebayser 107d9c2
Merge branch 'upstream_main' into bert
maxdebayser e7044a6
Merge branch 'bert' into roberta_embedding
maxdebayser 352d8b2
Merge remote-tracking branch 'maxdebayser/bert'
laishzh 04b0bc6
feat: revert embedding_block_manager
laishzh 6440795
Merge branch 'origin/main'
laishzh 80c1885
feat: update with origin/main
laishzh 30b0f21
Merge branch 'upstream_main' into bert
maxdebayser 5793373
Merge branch 'bert' into roberta_embedding
maxdebayser 935c58d
add registry of encoder-only models
maxdebayser ddbae13
Merge branch 'upstream_main' into roberta_embedding
maxdebayser 44a4c04
Merge branch 'upstream_main' into roberta_embedding
maxdebayser File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,16 @@ | ||
from vllm import LLM | ||
|
||
# Sample prompts. | ||
prompts = [ | ||
"This is an example sentence.", | ||
"Another example sentence.", | ||
] | ||
|
||
# Create an LLM. | ||
model = LLM(model="bert-base-uncased", enforce_eager=True) | ||
outputs = model.encode(prompts) | ||
|
||
# Print the outputs. | ||
for output in outputs: | ||
print(output.outputs.embedding) # list of 768 floats | ||
print(len(output.outputs.embedding)) |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TODO: It's strange that just adding another head size here makes the code run. Perhaps this is actually a silent failure and the actual kernel has to be added somewhere.