flexflow / flexflow-serve Public

Notifications You must be signed in to change notification settings
Fork 0
Star 9

Code
Issues 51
Pull requests
Actions
Projects
Security
Insights

Additional navigation options

Code
Issues
Pull requests
Actions
Projects
Security
Insights

Issues: flexflow/flexflow-serve

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

51 Open 29 Closed

Author

Filter by author

Label

Filter by label

Use alt + click/return to exclude labels

or ⇧ + click/return for logical OR

Projects

Filter by project

Milestones

Filter by milestone

Assignee

Filter by who’s assigned

Assigned to nobody

Sort

Sort by

Newest Oldest Most commented Least commented Recently updated Least recently updated Best match

Most reactions

Issues list

Unable to use ./inference/incr_decoding/incr_decoding on inference branch

#1 opened Nov 26, 2024 by hugolatendresse

How to enable support for larger sequence lengths?

#2 opened Nov 22, 2024 by QAZWSX0827

Performance issue when batch_size is 32

#3 opened Oct 22, 2024 by letheantest

Stuck During Tree-Based Speculative Decoding with OPT Model

#4 opened Oct 16, 2024 by SeungjaeLim

cuIpcGetMemHandle triggered CUDA out of memory when I use flexflow on one gpu

#75 opened Sep 11, 2024 by sjtu-zwh

Error when I use larger batch size for spec-infer

#6 opened Sep 6, 2024 by lhr-30

Issue with C++ inference for model meta-llama/Llama-2-70b-hf

#5 opened Jul 24, 2024 by DDDDDYTS

Request for Graph Pruning Algorithm Code Location in FlexLLM

#7 opened Jun 26, 2024 by zbtrs

Performance Issue enhancement

New feature or request

#15 opened Apr 20, 2024 by lethean287

Flexflow attempts to register duplicate sharding functors bug

Something isn't working

#17 opened Apr 20, 2024 by rohany

Operators support for updated speculative inference design

#8 opened Apr 20, 2024 by chenzhuofu

Simplify BatchConfig

#9 opened Apr 12, 2024 by zikun-li

Update on request manager

#14 opened Apr 9, 2024 by zwang86

3 tasks

Can we modify the "expansion configuration" parameters, such as "token tree width" and "token tree depth"? question

Further information is requested

#53 opened Mar 27, 2024 by hky011011

Error while importing import flexflow.serve as ff

#18 opened Mar 26, 2024 by goelayu

Modify BeamSearchBatchConfig to Support General Tree Structure.

#13 opened Mar 15, 2024 by zikun-li

FFModel 'null' in compile_inference

#78 opened Mar 15, 2024 by april-yyt

Modify model cache path

#19 opened Mar 12, 2024 by lockshaw

Huggingface Format Convert & Upload PR Issues

#20 opened Mar 1, 2024 by april-yyt

use the cuda graph for specinfer and it's blocked bug

Something isn't working

#46 opened Feb 27, 2024 by lambda7xx

Does FlexFlow support P40?

#45 opened Feb 26, 2024 by wozwdaqian

Using Algorithm 2 in SpecInfer paper, I get wrong outputs. bug

Something isn't working

#10 opened Feb 19, 2024 by dutsc

SpecInfer generate '<pad>'

#12 opened Feb 19, 2024 by dutsc

Add Fused Operator for Mixtral

#11 opened Jan 19, 2024 by jiazhihao

falcon-40b model loading does not work -- file_loader expects incorrect model structure enhancement

New feature or request

#16 opened Jan 9, 2024 by ktorkkola

Previous 1 2 3 Next

Previous Next

ProTip! Mix and match filters to narrow down what you’re looking for.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly