[Chatllama] Support Inference for trained models. #320

PierpaoloSorbellini · 2023-03-31T13:25:27Z

Description

Currently to perform inference of the models generated the user needs to interact with the model generated writing a small python script accordingly to how the model is saved by library, by loading the resulting checkpoint or model saved after training.

Moreover a lot of optimization can be integrated to speed-up the inference such as:

CPU Offloading.
llama.ccp implementation
accelerate / deepspeed distributed inference.

TODO

Implement Inference Class to make inference very easy and even possible from CLI.
Implement Inference with the optimisations available from deepspeed
Implement inference with the optimisations available from accelerate
Implement fast lama inference with known library llama.ccp implementation

shrinath-suresh · 2023-04-07T17:36:15Z

@PierpaoloSorbellini The inference section is tagged with WIP. Do we have any basic inference code available in chatllama to load actor_rl model and run few queries ?

PierpaoloSorbellini added good first issue Good for newcomers chatllama Issue related to the ChatLLaMA module labels Mar 31, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Chatllama] Support Inference for trained models. #320

[Chatllama] Support Inference for trained models. #320

PierpaoloSorbellini commented Mar 31, 2023 •

edited

Loading

shrinath-suresh commented Apr 7, 2023

[Chatllama] Support Inference for trained models. #320

[Chatllama] Support Inference for trained models. #320

Comments

PierpaoloSorbellini commented Mar 31, 2023 • edited Loading

Description

TODO

shrinath-suresh commented Apr 7, 2023

PierpaoloSorbellini commented Mar 31, 2023 •

edited

Loading