Skip to content

Commit

Permalink
docs(intro): add loaders, diagrams, begin plugins, ref #204
Browse files Browse the repository at this point in the history
  • Loading branch information
iboB committed Dec 9, 2024
1 parent f052885 commit cf4341b
Show file tree
Hide file tree
Showing 2 changed files with 51 additions and 19 deletions.
2 changes: 1 addition & 1 deletion doc/iapi.md
Original file line number Diff line number Diff line change
Expand Up @@ -88,7 +88,7 @@ As an example, here's how the whisper.cpp schema looked like at some point durin
"properties": {
"audioBinaryMono": {
"description": "Audio data to transcribe",
"type": "blob"
"type": "binary"
}
},
"required": [ "audioBinaryMono" ]
Expand Down
68 changes: 50 additions & 18 deletions doc/intro.md
Original file line number Diff line number Diff line change
Expand Up @@ -10,32 +10,59 @@ AC Local provides a unified API for doing inference with [multiple models](https

The API defines the following elements:

```mermaid
flowchart LR
Loader --loads--> Model --creates--> Instances
Instance --runs-->Op
Op -.changes state.-> Instance
Op --produces--> Result
```

### Model Loader

A `ModelLoader` is an object which can load a model from a description. The description is a JSON object which contains the model type, assets, and other parameters. The loader is a factory for models.

### Model

A `Model`, in API terms, is an object which represents an AI model (weights, parameters) loaded into memory. Once created, a model is immutable and *does* nothing on its own, but is the means to create an...

### Instance

The `Instance` is an object associated with a `Model` which can do inference based on the parameters it's created with. The instance holds a private state of its own which is not shared with other instances (what *is* shared is the model). The instance state is not immutable and can change with each subsequent inference operation.
The `Instance` is an object associated with a `Model` which can do inference based on the parameters it's created with. The instance holds a private state which is not shared with other instances (what *is* shared is the model). The instance state is not immutable and can change with each subsequent inference operation.

### Instance Operation

... or `op` for short, is a function (method) which can be called on an instance to perform inference and return a result. Ops may change the internal instance state.

Running can be visualized as sequence diagram

```mermaid
sequenceDiagram
participant App
create participant Instance
App ->> Instance : create
App ->> Instance : run op
create participant Op
Instance ->> Op : execute
Op -->> Instance : change state
destroy Op
Op ->> App : produce result
```

### Example

Now, this is all pretty abstract, so let's give an example. In pseudo-code:

```python
model = LargeLanguageModel("llama-2-7b") # create a model
model = loader.load("llama-2-7b") # create a model
instance = model.create_instance() # create an instance
result = instance.complete("A recipe for rice cakes:") # run op and get result
print(result) # consume the result
```

## API Layers

The example above is pretty neat, but our goal is to have a *unified* API for multiple models. There's nothing unified in calling `.complete("text")` for an instance. Such an operation simply makes no sence for many types of models.
The example above is pretty neat, but our goal is to have a *unified* API for multiple models. There's nothing unified in calling `.complete("text")` for an instance. Such an operation simply makes no sense for many types of models.

To facilitate the goal the API is split into two layers:

Expand All @@ -45,15 +72,15 @@ This is what's different for each model type.

Some close (but not quite complete) descriptions of it could be duck-typed, or "stringly"-typed, or JSON-typed.

Every model type defines a schema for the inference API. The schema describes things like what types of instances can be created for the model, what ops each instance provides, then what input each op gets and what it returns as a result. A more detauled description of schemas (or the schema schema) is available [here](iapi.md).
Every model type defines a schema for the inference API. The schema describes things like what types of instances can be created for the model, what ops each instance provides, then what input each op gets and what it returns as a result. A more detailed description of schemas (or the schema schema) is available [here](iapi.md).

The main carrier of data for this API is an object called `Dict`. This stands for dictionary. A more formal description if `Dict` is available [here](dict.md). In short it's basically a POJO (where J stands for JavaScript), so a JSON object, but with the notable addition of the data type `binary` - which is contiguous memory buffer. So... not a JSON, but a [CBOR](https://cbor.io/) object, at least in terms of data types.
The main carrier of data for this API is an object called `Dict`. This stands for dictionary. A more formal description if `Dict` is available [here](dict.md). In short it's basically a POJO (where J stands for JavaScript), so a JSON object, but with the notable addition of the data type `binary` which is contiguous memory buffer. So... not a JSON, but a [CBOR](https://cbor.io/) object, at least in terms of data types.

With all this we can transform our example from above to someting like *(still pseudo-code)*:
With all this we can transform our example from above to something like *(still pseudo-code)*:

```python
# create a model
model = LargeLanguageModel("llama-2-7b")
model = loader.load("llama-2-7b") # load an llm

# create a general instance with a small context size
instance = model.create_instance("general", dict(context_size = 1024))
Expand All @@ -70,7 +97,7 @@ print(recipe)

```python
# create a model
model = ImageModel("stable-diffusion-3")
model = loader.load("stable-diffusion-3") # load an image generation model

# create an instance with a specific resolution
instance = model.create_instance("general", dict(resolution = 512))
Expand All @@ -94,33 +121,38 @@ Here's a quip:

> The Inference API is different for each model type and the same for all programming languages. The Language API is the same for all model types and different for each programming language.
It's what gives you the concrete representations of `Model`, `Instance`, `op`-s, and `Dict`, and most importantly a way to create models.
It's what gives you the concrete representations of `Model`, `Instance`, `op`-s, and `Dict`, and most importantly a way to load models.

The base implementation is in C++, but wrappers for other languages are provided. Find the documentation [here](lapi.md).

And with it we can have actual working code like:

```cpp
ac::local::ModelFactory factory;
ac::local::addLlamaInference(factory);
// load all plugins and the loaders that they provide
ac::local::Lib::loadAllPlugins();

auto model = factory.createModel(
// create a model from the first loader which accepts "llama.cpp gguf"
auto model = ac::local::Lib::createModel(
{
.inferenceType = "llama.cpp",
.type = "llama.cpp gguf",
.assets = {
{.path = "/my/path/to/llama3-q6k.gguf"}
{.path = "/path/to/model.gguf"}
}
}, {}, {}
},
{ /*default params*/ }
);

auto instance = model->createInstance("general", {});
// create an instance of the model
auto instance = model->createInstance("general", { /*default params*/ });

auto result = instance->runOp("run",
{{"prompt", "A recipe for rice cakes:"}}, {});
// run the op "run" with a prompt
auto result = instance->runOp("run", {{"prompt", "A recipe for rice cakes:"}});

std::cout << result << "\n";
```
Wait. What are plugins?
## More
This introduction is more or less language agnostic. You can check out the C++-centric documentation on structure and internals [here](internals.md).

0 comments on commit cf4341b

Please sign in to comment.