Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[chatllama]Do I need to split the llama model manully? #322

Open
balcklive opened this issue Mar 31, 2023 · 2 comments
Open

[chatllama]Do I need to split the llama model manully? #322

balcklive opened this issue Mar 31, 2023 · 2 comments

Comments

@balcklive
Copy link

I downloaded a llama 7B model. It only get one model file which ends with .pth. But as the model loading code in llama_model .py showed as below says. If I want train the model with multi gpus, I need to divide the model into the same number as the graphics card. May I ask how should do that? or is there anything I did not understand?

def load_checkpoints(
ckpt_dir: str, local_rank: int, world_size: int
) -> Tuple[dict, dict]:
checkpoints = sorted(Path(ckpt_dir).glob("*.pth"))
assert world_size == len(checkpoints), (
f"Loading a checkpoint for MP={len(checkpoints)} but world " # world size means numbers of gpus used right?
f"size is {world_size}"
)
ckpt_path = checkpoints[local_rank]
print("Loading")
checkpoint = torch.load(ckpt_path, map_location="cpu")
with open(Path(ckpt_dir) / "params.json", "r") as f:
params = json.loads(f.read())
return checkpoint, params

@sharlec
Copy link

sharlec commented Apr 4, 2023

I wonder about this question as well. I want to serve the 7B model on two servers, I am not sure what process should be done on model architecture

@PierpaoloSorbellini
Copy link
Collaborator

Hi @balcklive, you may have to enable fair scale and set the MP as stated in the llama documentations.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants