-
May I ask the proper range for input sequence length to do the inference using the evo-1-131k-base model? |
Beta Was this translation helpful? Give feedback.
Replies: 6 comments
-
Prompting with longer sequences requires sharding for the model, which is currently not supported. However, you can generate much longer, up to 500k and beyond on a single 80Gb GPU. If you'd like to test the model with longer prompt I recommend Together's API. |
Beta Was this translation helpful? Give feedback.
-
could you elaborate how to generate 500k on a single 80Gb GPU, I got OOM on A100 with 3kb sequence. Thank you |
Beta Was this translation helpful? Give feedback.
-
@pan-genome we were able to just use the standard HuggingFace sampling API (e.g., loading with |
Beta Was this translation helpful? Give feedback.
-
could you provide a working code example? thank you |
Beta Was this translation helpful? Give feedback.
-
Something like model_config = AutoConfig.from_pretrained(
'togethercomputer/evo-1-131k-base',
trust_remote_code=True,
revision="1.1_fix",
)
model_config.max_seqlen = 500_000
model = AutoModelForCausalLM.from_pretrained(
'togethercomputer/evo-1-131k-base',
config=model_config,
trust_remote_code=True,
revision="1.1_fix",
)
outputs = model.generate(
input_ids,
max_new_tokens=500_000,
temperature=1.,
top_k=4,
) |
Beta Was this translation helpful? Give feedback.
-
while using the standard HuggingFace sampling API (e.g., loading with AutoModelForCausalLM.from_pretrained(), sampling with model.generate()) to generate much longer sequence, we still can only getting embeddings for max 3kb long sequence using similar HuggingFace sampling API modification. I was wondering how can we get embeddings for longer sequence? could you provide some example code? thanks you. |
Beta Was this translation helpful? Give feedback.
Prompting with longer sequences requires sharding for the model, which is currently not supported. However, you can generate much longer, up to 500k and beyond on a single 80Gb GPU.
If you'd like to test the model with longer prompt I recommend Together's API.