SpecInfer generate '<pad>' #12

dutsc · 2024-02-19T03:54:32Z

My machine configuration is 4*3090, and my example prompt is: please introduce Kobe Bryant, who played basketball in NBA. I use three SSMs, all of which are opt-125M. Only when the LLM uses opt-13b, the generated text looks It's normal until it gets up, as follows:

When I use smaller LLMs (opt-6.7b, opt-1.3b), the generated text is all .

why is that?

My script is as follows: (in the directory /workspace/Flexflow/build/). The prompt.json is "please introduce Kobe Bryant, who played basketball in NBA".

./inference/spec_infer/spec_infer \
    -ll:gpu 4 \
    -ll:fsize 22000 \
    -ll:zsize 30000 \
    -llm-model /models/opt-13b/ \
    -ssm-model /models/opt-125m/ \
    -ssm-model /models/opt-125m/ \
    -ssm-model /models/opt-125m/ \
    -prompt /workspace/FlexFlow/prompts/prompt.json \
    -tensor-parallelism-degree 4 \
    --fusion > ../sclog/spec_infer.log

Thank you very much for your valuable time.

xinhaoc · 2024-02-24T21:13:08Z

@dutsc Hi! We have demonstrated using one ssm can achieve best performance in our latest version paper.
And there is an assertion in the at here to make sure only one ssm is registered. Please make sure you are using the newest code. Please tell me if you still get the incorrect output.

xinhaoc self-assigned this Feb 20, 2024

lockshaw transferred this issue from flexflow/flexflow-train Dec 16, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

SpecInfer generate '<pad>' #12

SpecInfer generate '<pad>' #12

dutsc commented Feb 19, 2024

xinhaoc commented Feb 24, 2024

SpecInfer generate '<pad>' #12

SpecInfer generate '<pad>' #12

Comments

dutsc commented Feb 19, 2024

xinhaoc commented Feb 24, 2024