Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Text Generate REST API schema #18

Merged
merged 14 commits into from
Feb 6, 2024
186 changes: 186 additions & 0 deletions specification/protocol/generate_rest.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,186 @@
openapi: 3.1.0
info:
title: Open Inference API for text generation
description: Open Inference API for text generation
version: 1.0.0
components:
schemas:
GenerateRequest:
type: object
required:
- text_input
properties:
text_input:
type: string
parameters:
$ref: '#/components/schemas/GenerateParameters'
GenerateParameters:
yuzisun marked this conversation as resolved.
Show resolved Hide resolved
allOf:
$ref: '#/components/schemas/GenerateParameters'
type: object
additionalProperties: {}
properties:
temperature:
type: number
format: float
default: null
yuzisun marked this conversation as resolved.
Show resolved Hide resolved
minimum: 0
description: What sampling temperature to use, between 0 and 2. Higher values like 0.8 will make the output more random, while lower values like 0.2 will make it more focused and deterministic.
top_p:
type: number
format: float
maximum: 1
minimum: 0
description: An alternative to sampling with temperature, called nucleus sampling, where the model considers the results of the tokens with top_p probability mass. So 0.1 means only the tokens comprising the top 10% probability mass are considered.
max_tokens:
type: integer
format: int32
default: 20
minimum: 0
yuzisun marked this conversation as resolved.
Show resolved Hide resolved
maximum: 512
description: The maximum number of tokens to generate in the completion.
stop:
type: array
items:
type: string
maxItems: 5
yuzisun marked this conversation as resolved.
Show resolved Hide resolved
description: Up to 5 sequences where the API will stop generating further tokens.
GenerateResponse:
type: object
required:
- text_output
- model_name
properties:
text_output:
type: string
model_name:
type: string
model_version:
type: string
GenerateStreamResponse:
type: object
required:
- text_output
- model_name
properties:
text_output:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is concatenated text output, we might still want to see the token generated for each iteration.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the Nvidia implementation, each response in returning cumulative set of tokens.

1st json
{
text_output: "Here is"
}
.
.
.
..
subsequent json response
{
text_output: "Here is the output for the prompt"
}

Should we add additional property to display token generated in current response set?

type: string
model_name:
type: string
model_version:
type: string
finish_reason:
type: string
GenerateErrorResponse:
type: object
required:
- error
properties:
error:
type: string
paths:
/v2/models/${MODEL_NAME}[/versions/${MODEL_VERSION}]/generate:
post:
parameters:
- name: model_name
required: true
in: path
schema:
type: string
requestBody:
content:
application/json:
schema:
$ref: '#/components/schemas/GenerateRequest'
responses:
'200':
yuzisun marked this conversation as resolved.
Show resolved Hide resolved
description: generated text
content:
application/json:
schema:
$ref: '#/components/schemas/GenerateResponse'
'422':
description: Input validation error
content:
application/json:
schema:
$ref: '#/components/schemas/GenerateErrorResponse'
example:
error: Input validation error
'424':
description: Generation Error
content:
application/json:
schema:
$ref: '#/components/schemas/GenerateErrorResponse'
example:
error: Request failed during generation
'429':
description: Model is overloaded
content:
application/json:
schema:
$ref: '#/components/schemas/GenerateErrorResponse'
example:
error: Model is overloaded
'500':
description: Incomplete generation
content:
application/json:
schema:
$ref: '#/components/schemas/GenerateErrorResponse'
example:
error: Incomplete generation

/v2/models/${MODEL_NAME}[/versions/${MODEL_VERSION}]/generate_stream:
post:
parameters:
- name: model_name
required: true
in: path
schema:
type: string
requestBody:
content:
application/json:
schema:
$ref: '#/components/schemas/GenerateRequest'
responses:
'200':
description: generated text stream
content:
text/event-stream:
schema:
$ref: '#/components/schemas/GenerateStreamResponse'
'422':
description: Input validation error
content:
text/event-stream:
schema:
$ref: '#/components/schemas/GenerateErrorResponse'
example:
error: Input validation error
'424':
description: Generation Error
content:
text/event-stream:
schema:
$ref: '#/components/schemas/GenerateErrorResponse'
example:
error: Request failed during generation
'429':
description: Model is overloaded
content:
text/event-stream:
schema:
$ref: '#/components/schemas/GenerateErrorResponse'
example:
error: Model is overloaded
'500':
description: Incomplete generation
content:
text/event-stream:
schema:
$ref: '#/components/schemas/GenerateErrorResponse'
example:
error: Incomplete generation