Request Distributor Server

The request distributor service can parallelize multiple api_server services. Users only need to access the proxy URL, and they can indirectly access different api_server services. The proxy service will automatically distribute requests internally, achieving load balancing.

Startup

Start the proxy service:

lmdeploy serve proxy --server-name {server_name} --server-port {server_port} --strategy "min_expected_latency"

After startup is successful, the URL of the proxy service will also be printed by the script. Access this URL in your browser to open the Swagger UI. Subsequently, users can add it directly to the proxy service when starting the api_server service by using the --proxy-url command. For example: lmdeploy serve api_server InternLM/internlm2-chat-1_8b --proxy-url http://0.0.0.0:8000。 In this way, users can access the services of the api_server through the proxy node, and the usage of the proxy node is exactly the same as that of the api_server, both of which are compatible with the OpenAI format.

/v1/models
/v1/chat/completions
/v1/completions

Node Management

Through Swagger UI, we can see multiple APIs. Those related to api_server node management include:

/nodes/status
/nodes/add
/nodes/remove

They respectively represent viewing all api_server service nodes, adding a certain node, and deleting a certain node.

Node Management through curl

curl -X 'GET' \
  'http://localhost:8000/nodes/status' \
  -H 'accept: application/json'

curl -X 'POST' \
  'http://localhost:8000/nodes/add' \
  -H 'accept: application/json' \
  -H 'Content-Type: application/json' \
  -d '{
  "url": "http://0.0.0.0:23333"
}'

curl -X 'POST' \
  'http://localhost:8000/nodes/remove?node_url=http://0.0.0.0:23333' \
  -H 'accept: application/json' \
  -d ''

Node Management through python

# query all nodes
import requests
url = 'http://localhost:8000/nodes/status'
headers = {'accept': 'application/json'}
response = requests.get(url, headers=headers)
print(response.text)

# add a new node
import requests
url = 'http://localhost:8000/nodes/add'
headers = {
    'accept': 'application/json',
    'Content-Type': 'application/json'
}
data = {"url": "http://0.0.0.0:23333"}
response = requests.post(url, headers=headers, json=data)
print(response.text)

# delete a node
import requests
url = 'http://localhost:8000/nodes/remove'
headers = {'accept': 'application/json',}
params = {'node_url': 'http://0.0.0.0:23333',}
response = requests.post(url, headers=headers, data='', params=params)
print(response.text)

Dispatch Strategy

The current distribution strategies of the proxy service are as follows:

random： dispatches based on the ability of each api_server node provided by the user to process requests. The greater the request throughput, the more likely it is to be allocated. Nodes that do not provide throughput are treated according to the average throughput of other nodes.
min_expected_latency： allocates based on the number of requests currently waiting to be processed on each node, and the throughput capability of each node, calculating the expected time required to complete the response. The shortest one gets allocated. Nodes that do not provide throughput are treated similarly.
min_observed_latency： allocates based on the average time required to handle a certain number of past requests on each node. The one with the shortest time gets allocated.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

proxy_server.md

proxy_server.md

Request Distributor Server

Startup

Node Management

Node Management through curl

Node Management through python

Dispatch Strategy

Files

proxy_server.md

Latest commit

History

proxy_server.md

File metadata and controls

Request Distributor Server

Startup

Node Management

Node Management through curl

Node Management through python

Dispatch Strategy