`PETALS`: Distributed Inference and Fine-tuning of Large Language Models Over The Internet #347

yuiseki · 2023-12-16T11:55:31Z

https://github.com/bigscience-workshop/petals

Run LLMs at home, BitTorrent-style. Fine-tuning and inference up to 10x faster than offloading

Large language models (LLMs) are useful in many NLP tasks and become more capable with size, with the best open-source models having over 50 billion parameters. However, using these 50B+ models requires high-end hardware, making them inaccessible to most researchers. In this work, we investigate methods for cost-efficient inference and fine-tuning of LLMs, comparing local and distributed strategies. We observe that a large enough model (50B+) can run efficiently even on geodistributed devices in a consumer-grade network. This could allow running LLM efficiently by pooling together idle compute resources of multiple research groups and volunteers. We address two open problems: (1) how to perform inference and fine-tuning reliably if any device can disconnect abruptly and (2) how to partition LLMs between devices with uneven hardware, joining and leaving at will. In order to do that, we develop special fault-tolerant inference algorithms and load-balancing protocols that automatically assign devices to maximize the total system throughput. We showcase these algorithms in Petals - a decentralized system that runs Llama 2 (70B) and BLOOM (176B) over the Internet up to 10x faster than offloading for interactive generation. We evaluate the performance of our system in simulated conditions and a real-world setup spanning two continents.

https://twitter.com/ai_database/status/1735277345874104427

LLMはときに50Bを超えるようなパラメーターのものもあり、高性能なハードウェアがないと使えません。
従来の処理方法はもはや効率が悪いと考えられています。

そこで研究者らは、分散型の推論ネットワーク『PETALS』を開発しました。
平たく言うと「みんなで動かすLLM」といったようなアイデアです。

■『PETALS』のポイント
① デバイス群をネットでつなぎLLMの推論やファインチューニングを実行
② デバイスが故障したりネットが不安定でも正確に推論
③ ボランティアの力も借りて継続的にシステムを運用

■分散型推論アルゴリズム
① 世界中に分散した不安定なデバイス同士を接続
② トランスフォーマーブロックの計算はサーバーに委ねる
③ 故障時は復元するためのキャッシュを保持

■実験と結果
① Llama 2 (70B) と BLOOM (176B) を用いて実験
② ネット遅延とサーバー故障をテスト
③ 従来のローカルオフロード手法よりも、自己回帰生成を10倍以上高速に実行

本システムは、LLMへのアクセシビリティを格段に向上する潜在的な価値があると主張されています。

ただし、LLMを媒介して伝達するデータのプライバシーやセキュリティといった問題には注意が必要です。

yuiseki · 2023-12-17T01:59:33Z

I found that the Python version of LangChain supports Petals.
https://python.langchain.com/docs/integrations/llms/petals

yuiseki · 2023-12-23T03:42:35Z

Raspberry Piでのフィジビリティスタディ

petalsは、クライアントとして呼び出す側はGPUは不要であることがわかりました
Raspberry Pi 4 Model B (8GB) で、petalsのpublic swarmを使って、Llama 2 (70B)による推論を問題なく実行することができました！！

手順

from transformers import AutoTokenizer
from petals import AutoDistributedModelForCausalLM

# Choose any model available at https://health.petals.dev
model_name = "petals-team/StableBeluga2"  # This one is fine-tuned Llama 2 (70B)

# Connect to a distributed network hosting model layers
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoDistributedModelForCausalLM.from_pretrained(model_name)

# Run the model as if it were on your computer
inputs = tokenizer("What is petals?", return_tensors="pt")["input_ids"]
outputs = model.generate(inputs, max_new_tokens=10)
print(tokenizer.decode(outputs[0]))

sudo apt install pip -y
pip install git+https://github.com/bigscience-workshop/petals
python main.py

yuiseki · 2023-12-23T05:26:41Z

Windows 10, WSL2でpetalsのサーバーになってみる

サーバーになるためにはGPUありのマシンが必須
- CPUのみのマシンでもサーバーになれるが、GPUありマシンの1/100の性能しか出ない
以下のように実行
- block_indices を指定しないとGPUの性能限界まで使おうとする
- 10ブロック以上を指定したうえで public_name を指定することで、 https://health.petals.dev/ に名前が載せて貰える

python3 -m petals.cli.run_server petals-team/StableBeluga2 --block_indices 0:10 --public_name https://yuiseki.net

名前が載ったので満足

hfu · 2023-12-24T08:42:35Z

Amazing! I think we can jointly explore the potential of PETAL for Smart Maps. I will be following your experiments with Raspberry Pi soon.

yuiseki · 2024-05-27T00:18:56Z

情報共有の目的は達成できたと思うので閉じます

yuiseki added the information label Dec 16, 2023

github-actions bot added waiting/assign waiting/triage labels Dec 16, 2023

yuiseki mentioned this issue Dec 16, 2023

2023-12-19T12:30/12:55+09:00 🖐Smart Maps Meetup Weekly #342

Closed

yuiseki self-assigned this Dec 17, 2023

yuiseki removed the waiting/assign label Dec 17, 2023

yuiseki changed the title ~~PETALS: Distributed Inference and Fine-tuning of Large Language Models Over The Internet~~ PETALS: Distributed Inference and Fine-tuning of Large Language Models Over The Internet Dec 17, 2023

hfu added priority/MAY and removed waiting/triage labels Dec 17, 2023

hfu added priority/SHOULD and removed priority/MAY labels Dec 24, 2023

yuiseki mentioned this issue Dec 25, 2023

2023-12-26T12:30/12:55+09:00 🖐Smart Maps Meetup Weekly #349

Closed

yuiseki closed this as completed May 27, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

`PETALS`: Distributed Inference and Fine-tuning of Large Language Models Over The Internet #347

`PETALS`: Distributed Inference and Fine-tuning of Large Language Models Over The Internet #347

yuiseki commented Dec 16, 2023 •

edited

Loading

yuiseki commented Dec 17, 2023

yuiseki commented Dec 23, 2023

yuiseki commented Dec 23, 2023

hfu commented Dec 24, 2023

yuiseki commented May 27, 2024

PETALS: Distributed Inference and Fine-tuning of Large Language Models Over The Internet #347

PETALS: Distributed Inference and Fine-tuning of Large Language Models Over The Internet #347

Comments

yuiseki commented Dec 16, 2023 • edited Loading

yuiseki commented Dec 17, 2023

yuiseki commented Dec 23, 2023

Raspberry Piでのフィジビリティスタディ

手順

yuiseki commented Dec 23, 2023

Windows 10, WSL2でpetalsのサーバーになってみる

hfu commented Dec 24, 2023

yuiseki commented May 27, 2024

`PETALS`: Distributed Inference and Fine-tuning of Large Language Models Over The Internet #347

`PETALS`: Distributed Inference and Fine-tuning of Large Language Models Over The Internet #347

yuiseki commented Dec 16, 2023 •

edited

Loading