Max Seq length for inference #73

JunboShen · 2024-03-04T21:59:22Z

JunboShen
Mar 4, 2024

May I ask the proper range for input sequence length to do the inference using the evo-1-131k-base model?
I tried to use a single A100 and got CUDA Out of Memory when inputting a single sequence longer than 1000.
Thank you!

Answered by Zymrael

Mar 6, 2024

Prompting with longer sequences requires sharding for the model, which is currently not supported. However, you can generate much longer, up to 500k and beyond on a single 80Gb GPU.

If you'd like to test the model with longer prompt I recommend Together's API.

View full answer

Zymrael · 2024-03-06T17:17:04Z

Zymrael
Mar 6, 2024
Maintainer

Prompting with longer sequences requires sharding for the model, which is currently not supported. However, you can generate much longer, up to 500k and beyond on a single 80Gb GPU.

If you'd like to test the model with longer prompt I recommend Together's API.

0 replies

pan-genome · 2024-06-07T14:12:06Z

pan-genome
Jun 7, 2024

Prompting with longer sequences requires sharding for the model, which is currently not supported. However, you can generate much longer, up to 500k and beyond on a single 80Gb GPU.

If you'd like to test the model with longer prompt I recommend Together's API.

could you elaborate how to generate 500k on a single 80Gb GPU, I got OOM on A100 with 3kb sequence. Thank you

0 replies

brianhie · 2024-06-07T15:31:43Z

brianhie
Jun 7, 2024
Maintainer

@pan-genome we were able to just use the standard HuggingFace sampling API (e.g., loading with AutoModelForCausalLM.from_pretrained(), sampling with model.generate()) to generate 500k+ on an 80 Gb GPU.

0 replies

pan-genome · 2024-06-20T22:17:02Z

pan-genome
Jun 20, 2024

@pan-genome we were able to just use the standard HuggingFace sampling API (e.g., loading with AutoModelForCausalLM.from_pretrained(), sampling with model.generate()) to generate 500k+ on an 80 Gb GPU.

could you provide a working code example? thank you

0 replies

brianhie · 2024-06-21T02:19:18Z

brianhie
Jun 21, 2024
Maintainer

Something like

model_config = AutoConfig.from_pretrained(
    'togethercomputer/evo-1-131k-base',
    trust_remote_code=True,
    revision="1.1_fix",
)
model_config.max_seqlen = 500_000

model = AutoModelForCausalLM.from_pretrained(
    'togethercomputer/evo-1-131k-base',
    config=model_config,
    trust_remote_code=True,
    revision="1.1_fix",
)

outputs = model.generate(
    input_ids,
    max_new_tokens=500_000,
    temperature=1.,
    top_k=4,
)

0 replies

pan-genome · 2024-07-12T22:51:00Z

pan-genome
Jul 12, 2024

while using the standard HuggingFace sampling API (e.g., loading with AutoModelForCausalLM.from_pretrained(), sampling with model.generate()) to generate much longer sequence, we still can only getting embeddings for max 3kb long sequence using similar HuggingFace sampling API modification. I was wondering how can we get embeddings for longer sequence? could you provide some example code? thanks you.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Max Seq length for inference #73

{{title}}

Replies: 6 comments

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

Select a reply

Max Seq length for inference #73

JunboShen Mar 4, 2024

Replies: 6 comments

Zymrael Mar 6, 2024 Maintainer

pan-genome Jun 7, 2024

brianhie Jun 7, 2024 Maintainer

pan-genome Jun 20, 2024

brianhie Jun 21, 2024 Maintainer

pan-genome Jul 12, 2024

JunboShen
Mar 4, 2024

Zymrael
Mar 6, 2024
Maintainer

pan-genome
Jun 7, 2024

brianhie
Jun 7, 2024
Maintainer

pan-genome
Jun 20, 2024

brianhie
Jun 21, 2024
Maintainer

pan-genome
Jul 12, 2024