docs(intro): add loaders, diagrams, begin plugins, ref #204

alpaca-core · Dec 9, 2024 · cf4341b · cf4341b
1 parent f052885
commit cf4341b
Show file tree

Hide file tree

Showing 2 changed files with 51 additions and 19 deletions.
diff --git a/doc/iapi.md b/doc/iapi.md
@@ -88,7 +88,7 @@ As an example, here's how the whisper.cpp schema looked like at some point durin
             "properties": {
               "audioBinaryMono": {
                 "description": "Audio data to transcribe",
-                "type": "blob"
+                "type": "binary"
               }
             },
             "required": [ "audioBinaryMono" ]

diff --git a/doc/intro.md b/doc/intro.md
@@ -10,32 +10,59 @@ AC Local provides a unified API for doing inference with [multiple models](https
 
 The API defines the following elements:
 
+```mermaid
+flowchart LR
+  Loader --loads--> Model --creates--> Instances
+  Instance --runs-->Op
+  Op -.changes state.-> Instance 
+  Op --produces--> Result  
+```
+
+### Model Loader
+
+A `ModelLoader` is an object which can load a model from a description. The description is a JSON object which contains the model type, assets, and other parameters. The loader is a factory for models.
+
 ### Model
 
 A `Model`, in API terms, is an object which represents an AI model (weights, parameters) loaded into memory. Once created, a model is immutable and *does* nothing on its own, but is the means to create an...
 
 ### Instance
 
-The `Instance` is an object associated with a `Model` which can do inference based on the parameters it's created with. The instance holds a private state of its own which is not shared with other instances (what *is* shared is the model). The instance state is not immutable and can change with each subsequent inference operation.
+The `Instance` is an object associated with a `Model` which can do inference based on the parameters it's created with. The instance holds a private state which is not shared with other instances (what *is* shared is the model). The instance state is not immutable and can change with each subsequent inference operation.
 
 ### Instance Operation
 
 ... or `op` for short, is a function (method) which can be called on an instance to perform inference and return a result. Ops may change the internal instance state. 
 
+Running can be visualized as sequence diagram
+
+```mermaid
+sequenceDiagram
+    participant App
+    create participant Instance    
+    App ->> Instance : create
+    App ->> Instance : run op
+    create participant Op
+    Instance ->> Op : execute
+    Op -->> Instance : change state
+    destroy Op
+    Op ->> App : produce result
+```
+
 ### Example
 
 Now, this is all pretty abstract, so let's give an example. In pseudo-code:
 
 ```python
-model = LargeLanguageModel("llama-2-7b")   # create a model
+model = loader.load("llama-2-7b") # create a model
 instance = model.create_instance() # create an instance
 result = instance.complete("A recipe for rice cakes:") # run op and get result
 print(result) # consume the result
 ```
 
 ## API Layers
 
-The example above is pretty neat, but our goal is to have a *unified* API for multiple models. There's nothing unified in calling `.complete("text")` for an instance. Such an operation simply makes no sence for many types of models.
+The example above is pretty neat, but our goal is to have a *unified* API for multiple models. There's nothing unified in calling `.complete("text")` for an instance. Such an operation simply makes no sense for many types of models.
 
 To facilitate the goal the API is split into two layers:
 
@@ -45,15 +72,15 @@ This is what's different for each model type.
 
 Some close (but not quite complete) descriptions of it could be duck-typed, or "stringly"-typed, or JSON-typed.
 
-Every model type defines a schema for the inference API. The schema describes things like what types of instances can be created for the model, what ops each instance provides, then what input each op gets and what it returns as a result. A more detauled description of schemas (or the schema schema) is available [here](iapi.md).
+Every model type defines a schema for the inference API. The schema describes things like what types of instances can be created for the model, what ops each instance provides, then what input each op gets and what it returns as a result. A more detailed description of schemas (or the schema schema) is available [here](iapi.md).
 
-The main carrier of data for this API is an object called `Dict`. This stands for dictionary. A more formal description if `Dict` is available [here](dict.md). In short it's basically a POJO (where J stands for JavaScript), so a JSON object, but with the notable addition of the data type `binary` - which is contiguous memory buffer. So... not a JSON, but a [CBOR](https://cbor.io/) object, at least in terms of data types.
+The main carrier of data for this API is an object called `Dict`. This stands for dictionary. A more formal description if `Dict` is available [here](dict.md). In short it's basically a POJO (where J stands for JavaScript), so a JSON object, but with the notable addition of the data type `binary` which is contiguous memory buffer. So... not a JSON, but a [CBOR](https://cbor.io/) object, at least in terms of data types.
 
-With all this we can transform our example from above to someting like *(still pseudo-code)*:
+With all this we can transform our example from above to something like *(still pseudo-code)*:
 
 ```python
 # create a model
-model = LargeLanguageModel("llama-2-7b")
+model = loader.load("llama-2-7b") # load an llm
 
 # create a general instance with a small context size
 instance = model.create_instance("general", dict(context_size = 1024))
@@ -70,7 +97,7 @@ print(recipe)
 
 ```python
 # create a model
-model = ImageModel("stable-diffusion-3")
+model = loader.load("stable-diffusion-3") # load an image generation model
 
 # create an instance with a specific resolution
 instance = model.create_instance("general", dict(resolution = 512))
@@ -94,33 +121,38 @@ Here's a quip:
 
 > The Inference API is different for each model type and the same for all programming languages. The Language API is the same for all model types and different for each programming language.
 
-It's what gives you the concrete representations of `Model`, `Instance`, `op`-s, and `Dict`, and most importantly a way to create models. 
+It's what gives you the concrete representations of `Model`, `Instance`, `op`-s, and `Dict`, and most importantly a way to load models. 
 
 The base implementation is in C++, but wrappers for other languages are provided. Find the documentation [here](lapi.md).
 
 And with it we can have actual working code like:
 
 ```cpp
-ac::local::ModelFactory factory;
-ac::local::addLlamaInference(factory);
+// load all plugins and the loaders that they provide
+ac::local::Lib::loadAllPlugins(); 
 
-auto model = factory.createModel(
+// create a model from the first loader which accepts "llama.cpp gguf"
+auto model = ac::local::Lib::createModel(
     {
-        .inferenceType = "llama.cpp",
+        .type = "llama.cpp gguf",
         .assets = {
-            {.path = "/my/path/to/llama3-q6k.gguf"}
+            {.path = "/path/to/model.gguf"}
         }
-    }, {}, {}
+    }, 
+    { /*default params*/ }
 );
 
-auto instance = model->createInstance("general", {});
+// create an instance of the model
+auto instance = model->createInstance("general", { /*default params*/ });
 
-auto result = instance->runOp("run",
-    {{"prompt", "A recipe for rice cakes:"}}, {});
+// run the op "run" with a prompt
+auto result = instance->runOp("run", {{"prompt", "A recipe for rice cakes:"}});
 
 std::cout << result << "\n";
 ```
 
+Wait. What are plugins?
+
 ## More
 
 This introduction is more or less language agnostic. You can check out the C++-centric documentation on structure and internals [here](internals.md).