- Moved cost calculation into
@modelfusion/cost-calculation
package. Thanks @jakedetels for the refactoring!
-
FileCache
for caching responses to disk. Thanks @jakedetels for the feature! Example:import { generateText, openai } from "modelfusion"; import { FileCache } from "modelfusion/node"; const cache = new FileCache(); const text1 = await generateText({ model: openai .ChatTextGenerator({ model: "gpt-3.5-turbo", temperature: 1 }) .withTextPrompt(), prompt: "Write a short story about a robot learning to love", logging: "basic-text", cache, }); console.log({ text1 }); const text2 = await generateText({ model: openai .ChatTextGenerator({ model: "gpt-3.5-turbo", temperature: 1 }) .withTextPrompt(), prompt: "Write a short story about a robot learning to love", logging: "basic-text", cache, }); console.log({ text2 }); // same text
- Try both dynamic imports and require for loading libraries on demand.
ObjectGeneratorTool
: a tool to create synthetic or fictional structured data usinggenerateObject
. DocsjsonToolCallPrompt.instruction()
: Create a instruction prompt for tool calls that uses JSON.
jsonToolCallPrompt
automatically enables JSON mode or grammars when supported by the model.
-
Added prompt function support to
generateText
,streamText
,generateObject
, andstreamObject
. You can create prompt functions for text, instruction, and chat prompts usingcreateTextPrompt
,createInstructionPrompt
, andcreateChatPrompt
. Prompt functions allow you to load prompts from external sources and improve the prompt logging. Example:const storyPrompt = createInstructionPrompt( async ({ protagonist }: { protagonist: string }) => ({ system: "You are an award-winning author.", instruction: `Write a short story about ${protagonist} learning to love.`, }) ); const text = await generateText({ model: openai .ChatTextGenerator({ model: "gpt-3.5-turbo" }) .withInstructionPrompt(), prompt: storyPrompt({ protagonist: "a robot", }), });
- Refactored build to use
tsup
.
- Support for OpenAI embedding custom dimensions.
- breaking change: renamed
embeddingDimensions
setting todimensions
- Support for OpenAI
text-embedding-3-small
andtext-embedding-3-large
embedding models. - Support for OpenAI
gpt-4-turbo-preview
,gpt-4-0125-preview
, andgpt-3.5-turbo-0125
chat models.
- Add
type-fest
as dependency to fix type inference errors.
-
ObjectStreamResponse
andObjectStreamFromResponse
serialization functions for using server-generated object streams in web applications.Server example:
export async function POST(req: Request) { const { myArgs } = await req.json(); const objectStream = await streamObject({ // ... }); // serialize the object stream to a response: return new ObjectStreamResponse(objectStream); }
Client example:
const response = await fetch("/api/stream-object-openai", { method: "POST", body: JSON.stringify({ myArgs }), }); // deserialize (result object is simpler than the full response) const stream = ObjectStreamFromResponse({ schema: itinerarySchema, response, }); for await (const { partialObject } of stream) { // do something, e.g. setting a React state }
-
breaking change: rename
generateStructure
togenerateObject
andstreamStructure
tostreamObject
. Related names have been changed accordingly. -
breaking change: the
streamObject
result stream contains additional data. You need to usestream.partialObject
or destructuring to access it:const objectStream = await streamObject({ // ... }); for await (const { partialObject } of objectStream) { console.clear(); console.log(partialObject); }
-
breaking change: the result from successful
Schema
validations is stored in thevalue
property (before:data
).
- Duplex speech streaming works in Vercel Edge Functions.
-
breaking change: updated
generateTranscription
interface. The function now takes amimeType
andaudioData
(base64-encoded string,Uint8Array
,Buffer
orArrayBuffer
). Example:import { generateTranscription, openai } from "modelfusion"; import fs from "node:fs"; const transcription = await generateTranscription({ model: openai.Transcriber({ model: "whisper-1" }), mimeType: "audio/mp3", audioData: await fs.promises.readFile("data/test.mp3"), });
-
Images in instruction and chat prompts can be
Buffer
orArrayBuffer
instances (in addition to base64-encoded strings andUint8Array
instances).
-
breaking change: Usage of Node
async_hooks
has been renamed fromnode:async_hooks
toasync_hooks
for easier Webpack configuration. To exclude theasync_hooks
from client-side bundling, you can use the following config for Next.js (next.config.mjs
ornext.config.js
):/** * @type {import('next').NextConfig} */ const nextConfig = { webpack: (config, { isServer }) => { if (isServer) { return config; } config.resolve = config.resolve ?? {}; config.resolve.fallback = config.resolve.fallback ?? {}; // async hooks is not available in the browser: config.resolve.fallback.async_hooks = false; return config; }, };
-
breaking change: ModelFusion uses
Uint8Array
instead ofBuffer
for better cross-platform compatibility (see also "Goodbye, Node.js Buffer"). This can lead to breaking changes in your code if you useBuffer
-specific methods. -
breaking change: Image content in multi-modal instruction and chat inputs (e.g. for GPT Vision) is passed in the
image
property (instead ofbase64Image
) and supports both base64 strings andUint8Array
inputs:const image = fs.readFileSync(path.join("data", "example-image.png")); const textStream = await streamText({ model: openai.ChatTextGenerator({ model: "gpt-4-vision-preview", maxGenerationTokens: 1000, }), prompt: [ openai.ChatMessage.user([ { type: "text", text: "Describe the image in detail:\n\n" }, { type: "image", image, mimeType: "image/png" }, ]), ], });
-
OpenAI-compatible providers with predefined API configurations have a customized provider name that shows up in the events.
-
breaking change:
streamStructure
returns an async iterable over deep partial objects. If you need to get the fully validated final result, you can use thefullResponse: true
option and await thestructurePromise
value. Example:const { structureStream, structurePromise } = await streamStructure({ model: ollama .ChatTextGenerator({ model: "openhermes2.5-mistral", maxGenerationTokens: 1024, temperature: 0, }) .asStructureGenerationModel(jsonStructurePrompt.text()), schema: zodSchema( z.object({ characters: z.array( z.object({ name: z.string(), class: z .string() .describe("Character class, e.g. warrior, mage, or thief."), description: z.string(), }) ), }) ), prompt: "Generate 3 character descriptions for a fantasy role playing game.", fullResponse: true, }); for await (const partialStructure of structureStream) { console.clear(); console.log(partialStructure); } const structure = await structurePromise; console.clear(); console.log("FINAL STRUCTURE"); console.log(structure);
-
breaking change: Renamed
text
value instreamText
withfullResponse: true
totextPromise
.
- Ollama streaming.
- Ollama structure generation and streaming.
- breaking change: rename
useTool
torunTool
anduseTools
torunTools
to avoid confusion with React hooks.
-
Perplexity AI chat completion support. Example:
import { openaicompatible, streamText } from "modelfusion"; const textStream = await streamText({ model: openaicompatible .ChatTextGenerator({ api: openaicompatible.PerplexityApi(), provider: "openaicompatible-perplexity", model: "pplx-70b-online", // online model with access to web search maxGenerationTokens: 500, }) .withTextPrompt(), prompt: "What is RAG in AI?", });
-
Embedding-support for OpenAI-compatible providers. You can for example use the Together AI embedding endpoint:
import { embed, openaicompatible } from "modelfusion"; const embedding = await embed({ model: openaicompatible.TextEmbedder({ api: openaicompatible.TogetherAIApi(), provider: "openaicompatible-togetherai", model: "togethercomputer/m2-bert-80M-8k-retrieval", }), value: "At first, Nox didn't know what to do with the pup.", });
-
classify
model function (docs) for classifying values. TheSemanticClassifier
has been renamed toEmbeddingSimilarityClassifier
and can be used in conjunction withclassify
:import { classify, EmbeddingSimilarityClassifier, openai } from "modelfusion"; const classifier = new EmbeddingSimilarityClassifier({ embeddingModel: openai.TextEmbedder({ model: "text-embedding-ada-002" }), similarityThreshold: 0.82, clusters: [ { name: "politics" as const, values: [ "they will save the country!", // ... ], }, { name: "chitchat" as const, values: [ "how's the weather today?", // ... ], }, ], }); // strongly typed result: const result = await classify({ model: classifier, value: "don't you love politics?", });
-
breaking change: Switch from positional parameters to named parameters (parameter object) for all model and tool functions. The parameter object is the first and only parameter of the function. Additional options (last parameter before) are now part of the parameter object. Example:
// old: const text = await generateText( openai .ChatTextGenerator({ model: "gpt-3.5-turbo", maxGenerationTokens: 1000, }) .withTextPrompt(), "Write a short story about a robot learning to love", { functionId: "example-function", } ); // new: const text = await generateText({ model: openai .ChatTextGenerator({ model: "gpt-3.5-turbo", maxGenerationTokens: 1000, }) .withTextPrompt(), prompt: "Write a short story about a robot learning to love", functionId: "example-function", });
This change was made to make the API more flexible and to allow for future extensions.
- Ollama response schema for repeated calls with Ollama 0.1.19 completion models. Thanks @Necmttn for the bugfix!
- Ollama response schema for repeated calls with Ollama 0.1.19 chat models. Thanks @jakedetels for the bug report!
- Synthia prompt template
- breaking change: Renamed
parentCallId
function parameter tocallId
to enable options pass-through. - Better output filtering for
detailed-object
log format (e.g. viamodelfusion.setLogFormat("detailed-object")
)
-
OllamaCompletionModel
supports setting the prompt template in the settings. Prompt formats are available underollama.prompt.*
. You can then call.withTextPrompt()
,.withInstructionPrompt()
or.withChatPrompt()
to use a standardized prompt.const model = ollama .CompletionTextGenerator({ model: "mistral", promptTemplate: ollama.prompt.Mistral, raw: true, // required when using custom prompt template maxGenerationTokens: 120, }) .withTextPrompt();
- breaking change: removed
.withTextPromptTemplate
onOllamaCompletionModel
.
- Incorrect export. Thanks @mloenow for the fix!
-
Schema-specific GBNF grammar generator for
LlamaCppCompletionModel
. When usingjsonStructurePrompt
, it automatically uses a GBNF grammar for the JSON schema that you provide. Example:const structure = await generateStructure( llamacpp .CompletionTextGenerator({ // run openhermes-2.5-mistral-7b.Q4_K_M.gguf in llama.cpp promptTemplate: llamacpp.prompt.ChatML, maxGenerationTokens: 1024, temperature: 0, }) // automatically restrict the output to your schema using GBNF: .asStructureGenerationModel(jsonStructurePrompt.text()), zodSchema( z.array( z.object({ name: z.string(), class: z .string() .describe("Character class, e.g. warrior, mage, or thief."), description: z.string(), }) ) ), "Generate 3 character descriptions for a fantasy role playing game. " );
-
LlamaCppCompletionModel
supports setting the prompt template in the settings. Prompt formats are available underllamacpp.prompt.*
. You can then call.withTextPrompt()
,.withInstructionPrompt()
or.withChatPrompt()
to use a standardized prompt.const model = llamacpp .CompletionTextGenerator({ // run https://huggingface.co/TheBloke/OpenHermes-2.5-Mistral-7B-GGUF with llama.cpp promptTemplate: llamacpp.prompt.ChatML, contextWindowSize: 4096, maxGenerationTokens: 512, }) .withChatPrompt();
- breaking change: renamed
response
torawResponse
when usingfullResponse: true
setting. - breaking change: renamed
llamacpp.TextGenerator
tollamacpp.CompletionTextGenerator
.
- breaking change: removed
.withTextPromptTemplate
onLlamaCppCompletionModel
.
-
Predefined Llama.cpp GBNF grammars:
llamacpp.grammar.json
: Restricts the output to JSON.llamacpp.grammar.jsonArray
: Restricts the output to a JSON array.llamacpp.grammar.list
: Restricts the output to a newline-separated list where each line starts with-
.
-
Llama.cpp structure generation support:
const structure = await generateStructure( llamacpp .TextGenerator({ // run openhermes-2.5-mistral-7b.Q4_K_M.gguf in llama.cpp maxGenerationTokens: 1024, temperature: 0, }) .withTextPromptTemplate(ChatMLPrompt.instruction()) // needed for jsonStructurePrompt.text() .asStructureGenerationModel(jsonStructurePrompt.text()), // automatically restrict the output to JSON zodSchema( z.object({ characters: z.array( z.object({ name: z.string(), class: z .string() .describe("Character class, e.g. warrior, mage, or thief."), description: z.string(), }) ), }) ), "Generate 3 character descriptions for a fantasy role playing game. " );
-
Semantic classifier. An easy way to determine a class of a text using embeddings. Example:
import { SemanticClassifier, openai } from "modelfusion"; const classifier = new SemanticClassifier({ embeddingModel: openai.TextEmbedder({ model: "text-embedding-ada-002", }), similarityThreshold: 0.82, clusters: [ { name: "politics" as const, values: [ "isn't politics the best thing ever", "why don't you tell me about your political opinions", "don't you just love the president", "don't you just hate the president", "they're going to destroy this country!", "they will save the country!", ], }, { name: "chitchat" as const, values: [ "how's the weather today?", "how are things going?", "lovely weather today", "the weather is horrendous", "let's go to the chippy", ], }, ], }); console.log(await classifier.classify("don't you love politics?")); // politics console.log(await classifier.classify("how's the weather today?")); // chitchat console.log( await classifier.classify("I'm interested in learning about llama 2") ); // null
- Anthropic support. Anthropic has a strong stance against open-source models and against non-US AI. I will not support them by providing a ModelFusion integration.
- Together AI text generation and text streaming using OpenAI-compatible chat models.
-
Custom call header support for APIs. You can pass a
customCallHeaders
function into API configurations to add custom headers. The function is called withfunctionType
,functionId
,run
, andcallId
parameters. Example for Helicone:const text = await generateText( openai .ChatTextGenerator({ api: new HeliconeOpenAIApiConfiguration({ customCallHeaders: ({ functionId, callId }) => ({ "Helicone-Property-FunctionId": functionId, "Helicone-Property-CallId": callId, }), }), model: "gpt-3.5-turbo", temperature: 0.7, maxGenerationTokens: 500, }) .withTextPrompt(), "Write a short story about a robot learning to love", { functionId: "example-function" } );
-
Rudimentary caching support for
generateText
. You can use aMemoryCache
to store the response of agenerateText
call. Example:import { MemoryCache, generateText, ollama } from "modelfusion"; const model = ollama .ChatTextGenerator({ model: "llama2:chat", maxGenerationTokens: 100 }) .withTextPrompt(); const cache = new MemoryCache(); const text1 = await generateText( model, "Write a short story about a robot learning to love:", { cache } ); console.log(text1); // 2nd call will use cached response: const text2 = await generateText( model, "Write a short story about a robot learning to love:", // same text { cache } ); console.log(text2);
-
validateTypes
andsafeValidateTypes
helpers that perform type checking of an object against aSchema
(e.g., azodSchema
).
Structure generation improvements.
.asStructureGenerationModel(...)
function toOpenAIChatModel
andOllamaChatModel
to create structure generation models from chat models.jsonStructurePrompt
helper function to create structure generation models.
import {
generateStructure,
jsonStructurePrompt,
ollama,
zodSchema,
} from "modelfusion";
const structure = await generateStructure(
ollama
.ChatTextGenerator({
model: "openhermes2.5-mistral",
maxGenerationTokens: 1024,
temperature: 0,
})
.asStructureGenerationModel(jsonStructurePrompt.text()),
zodSchema(
z.object({
characters: z.array(
z.object({
name: z.string(),
class: z
.string()
.describe("Character class, e.g. warrior, mage, or thief."),
description: z.string(),
})
),
})
),
"Generate 3 character descriptions for a fantasy role playing game. "
);
- breaking change: renamed
useToolsOrGenerateText
touseTools
- breaking change: renamed
generateToolCallsOrText
togenerateToolCalls
- Restriction on tool names. OpenAI tool calls do not have such a restriction.
Reworked API configuration support.
- All providers now have an
Api
function that you can call to create custom API configurations. The base URL set up is more flexible and allows you to override parts of the base URL selectively. api
namespace with retry and throttle configurations
- Updated Cohere models.
- Updated LMNT API calls to LMNT
v1
API. - breaking change: Renamed
throttleUnlimitedConcurrency
tothrottleOff
.
- breaking change: renamed
modelfusion/extension
tomodelfusion/internal
. This requires updatingmodelfusion-experimental
(if used) tov0.3.0
- Deprecated OpenAI completion models that will be deactivated on January 4, 2024.
-
Open AI compatible completion model. It e.g. works with Fireworks AI.
-
Together AI API configuration (for Open AI compatible chat models):
import { TogetherAIApiConfiguration, openaicompatible, streamText, } from "modelfusion"; const textStream = await streamText( openaicompatible .ChatTextGenerator({ api: new TogetherAIApiConfiguration(), model: "mistralai/Mixtral-8x7B-Instruct-v0.1", }) .withTextPrompt(), "Write a story about a robot learning to love" );
-
Updated Llama.cpp model settings. GBNF grammars can be passed into the
grammar
setting:const text = await generateText( llamacpp .TextGenerator({ maxGenerationTokens: 512, temperature: 0, // simple list grammar: grammar: `root ::= ("- " item)+ item ::= [^\\n]+ "\\n"`, }) .withTextPromptTemplate(MistralInstructPrompt.text()), "List 5 ingredients for a lasagna:\n\n" );
- Mistral instruct prompt template
- breaking change: Renamed
LlamaCppTextGenerationModel
toLlamaCppCompletionModel
.
- Updated
LlamaCppCompletionModel
to the latest llama.cpp version. - Fixed formatting of system prompt for chats in Llama2 2 prompt template.
Experimental features that are unlikely to become stable before v1.0 have been moved to a separate modelfusion-experimental
package.
- Cost calculation
guard
function- Browser and server features (incl. flow)
summarizeRecursively
function
-
Tool call support for chat prompts. Assistant messages can contain tool calls, and tool messages can contain tool call results. Tool calls can be used to implement e.g. agents:
const chat: ChatPrompt = { system: "You are ...", messages: [ChatMessage.user({ text: instruction })], }; while (true) { const { text, toolResults } = await useToolsOrGenerateText( openai .ChatTextGenerator({ model: "gpt-4-1106-preview" }) .withChatPrompt(), tools, // array of tools chat ); // add the assistant and tool messages to the chat: chat.messages.push( ChatMessage.assistant({ text, toolResults }), ChatMessage.tool({ toolResults }) ); if (toolResults == null) { return; // no more actions, break loop } // ... (handle tool results) }
-
streamText
returns atext
promise when invoked withfullResponse: true
. After the streaming has finished, the promise resolves with the full text.const { text, textStream } = await streamText( openai.ChatTextGenerator({ model: "gpt-3.5-turbo" }).withTextPrompt(), "Write a short story about a robot learning to love:", { fullResponse: true } ); // ... (handle streaming) console.log(await text); // full text
- breaking change: Unified text and multimodal prompt templates.
[Text/MultiModal]InstructionPrompt
is nowInstructionPrompt
, and[Text/MultiModalChatPrompt]
is nowChatPrompt
. - More flexible chat prompts: The chat prompt validation is now chat template specific and validated at runtime. E.g. the Llama2 prompt template only supports turns of user and assistant messages, whereas other formats are more flexible.
-
finishReason
support forgenerateText
.The finish reason can be
stop
(the model stopped because it generated a stop sequence),length
(the model stopped because it generated the maximum number of tokens),content-filter
(the model stopped because the content filter detected a violation),tool-calls
(the model stopped because it triggered a tool call),error
(the model stopped because of an error),other
(the model stopped for another reason), orunknown
(the model stop reason is not know or the model does not support finish reasons).You can extract it from the full response when using
fullResponse: true
:const { text, finishReason } = await generateText( openai .ChatTextGenerator({ model: "gpt-3.5-turbo", maxGenerationTokens: 200 }) .withTextPrompt(), "Write a short story about a robot learning to love:", { fullResponse: true } );
-
You can specify
numberOfGenerations
on image generation models and create multiple images by using thefullResponse: true
option. Example:// generate 2 images: const { images } = await generateImage( openai.ImageGenerator({ model: "dall-e-3", numberOfGenerations: 2, size: "1024x1024", }), "the wicked witch of the west in the style of early 19th century painting", { fullResponse: true } );
-
breaking change: Image generation models use a generalized
numberOfGenerations
parameter (instead of model specific parameters) to specify the number of generations.
- Automatic1111 Stable Diffusion Web UI configuration has separate configuration of host, port, and path.
- Automatic1111 Stable Diffusion Web UI uses negative prompt and seed.
ollama.ChatTextGenerator
model that calls the Ollama chat API.- Ollama chat messages and prompts are exposed through
ollama.ChatMessage
andollama.ChatPrompt
- OpenAI chat messages and prompts are exposed through
openai.ChatMessage
andopenai.ChatPrompt
- Mistral chat messages and prompts are exposed through
mistral.ChatMessage
andmistral.ChatPrompt
- breaking change: renamed
ollama.TextGenerator
toollama.CompletionTextGenerator
- breaking change: renamed
mistral.TextGenerator
tomistral.ChatTextGenerator
-
You can specify
numberOfGenerations
on text generation models and access multiple generations by using thefullResponse: true
option. Example:// generate 2 texts: const { texts } = await generateText( openai.CompletionTextGenerator({ model: "gpt-3.5-turbo-instruct", numberOfGenerations: 2, maxGenerationTokens: 1000, }), "Write a short story about a robot learning to love:\n\n", { fullResponse: true } );
-
breaking change: Text generation models use a generalized
numberOfGenerations
parameter (instead of model specific parameters) to specify the number of generations.
- breaking change: Renamed
maxCompletionTokens
text generation model setting tomaxGenerationTokens
.
-
breaking change:
responseType
option was changed intofullResponse
option and uses a boolean value to make discovery easy. The response values from the full response have been renamed for clarity. For base64 image generation, you can use theimageBase64
value from the full response:const { imageBase64 } = await generateImage(model, prompt, { fullResponse: true, });
- Better docs for the OpenAI chat settings. Thanks @bearjaws for the contribution!
- Streaming OpenAI chat text generation when setting
n:2
or higher returns only the stream from the first choice.
-
breaking change: Ollama image (vision) support. This changes the Ollama prompt format. You can add
.withTextPrompt()
to existing Ollama text generators to get a text prompt like before.Vision example:
import { ollama, streamText } from "modelfusion"; const textStream = await streamText( ollama.TextGenerator({ model: "bakllava", maxCompletionTokens: 1024, temperature: 0, }), { prompt: "Describe the image in detail", images: [image], // base-64 encoded png or jpeg } );
- breaking change: Switch Ollama settings to camelCase to align with the rest of the library.
cachePrompt
parameter for llama.cpp models. Thanks @djwhitt for the contribution!
- Prompt template for neural-chat models.
- Optional response prefix for instruction prompts to guide the LLM response.
- breaking change: Renamed prompt format to prompt template to align with the commonly used language (e.g. from model cards).
- Improved Ollama error handling.
-
breaking change: setting global function observers and global logging has changed. You can call methods on a
modelfusion
import:import { modelfusion } from "modelfusion"; modelfusion.setLogFormat("basic-text");
-
Cleaned output when using
detailed-object
log format.
-
Whisper.cpp
transcription (speech-to-text) model support.import { generateTranscription, whispercpp } from "modelfusion"; const data = await fs.promises.readFile("data/test.wav"); const transcription = await generateTranscription(whispercpp.Transcriber(), { type: "wav", data, });
- Better error reporting.
- Temperature and language settings to OpenAI transcription model.
maxValuesPerCall
setting forOpenAITextEmbeddingModel
to enable different configurations, e.g. for Azure. Thanks @nanotronic for the contribution!
- Multi-modal chat prompts. Supported by OpenAI vision chat models and by BakLLaVA prompt format.
- breaking change: renamed
ChatPrompt
toTextChatPrompt
to distinguish it from multi-modal chat prompts.
- experimental:
modelfusion/extension
export with functions and classes that are necessary to implement providers in 3rd party node modules. See lgrammel/modelfusion-example-provider for an example.
OpenAIChatMessage
function call support.
-
Support for OpenAI-compatible chat APIs. See OpenAI Compatible for details.
import { BaseUrlApiConfiguration, openaicompatible, generateText, } from "modelfusion"; const text = await generateText( openaicompatible .ChatTextGenerator({ api: new BaseUrlApiConfiguration({ baseUrl: "https://api.fireworks.ai/inference/v1", headers: { Authorization: `Bearer ${process.env.FIREWORKS_API_KEY}`, }, }), model: "accounts/fireworks/models/mistral-7b", }) .withTextPrompt(), "Write a story about a robot learning to love" );
- Introduce
uncheckedSchema()
facade function as an easier way to create unchecked ModelFusion schemas. This aligns the API withzodSchema()
.
- breaking change: Renamed
InstructionPrompt
interface toMultiModalInstructionPrompt
to clearly distinguish it fromTextInstructionPrompt
. - breaking change: Renamed
.withBasicPrompt
methods for image generation models to.withTextPrompt
to align with text generation models.
- Introduce
zodSchema()
facade function as an easier way to create new ModelFusion Zod schemas. This clearly distinguishes it fromZodSchema
that is also part of the zod library.
breaking change: generateStructure
and streamStructure
redesign. The new API does not require function calling and StructureDefinition
objects any more. This makes it more flexible and it can be used in 3 ways:
-
with OpenAI function calling:
const model = openai .ChatTextGenerator({ model: "gpt-3.5-turbo" }) .asFunctionCallStructureGenerationModel({ fnName: "...", fnDescription: "...", });
-
with OpenAI JSON format:
const model = openai .ChatTextGenerator({ model: "gpt-4-1106-preview", temperature: 0, maxCompletionTokens: 1024, responseFormat: { type: "json_object" }, }) .asStructureGenerationModel( jsonStructurePrompt((instruction: string, schema) => [ OpenAIChatMessage.system( "JSON schema: \n" + JSON.stringify(schema.getJsonSchema()) + "\n\n" + "Respond only using JSON that matches the above schema." ), OpenAIChatMessage.user(instruction), ]) );
-
with Ollama (and a capable model, e.g., OpenHermes 2.5):
const model = ollama .TextGenerator({ model: "openhermes2.5-mistral", maxCompletionTokens: 1024, temperature: 0, format: "json", raw: true, stopSequences: ["\n\n"], // prevent infinite generation }) .withPromptFormat(ChatMLPromptFormat.instruction()) .asStructureGenerationModel( jsonStructurePrompt((instruction: string, schema) => ({ system: "JSON schema: \n" + JSON.stringify(schema.getJsonSchema()) + "\n\n" + "Respond only using JSON that matches the above schema.", instruction, })) );
See generateStructure for details on the new API.
- breaking change: Restructured multi-modal instruction prompts and
OpenAIChatMessage.user()
-
Multi-tool usage from open source models
Use
TextGenerationToolCallsOrGenerateTextModel
and related helper methods.asToolCallsOrTextGenerationModel()
to create custom prompts & parsers.Examples:
examples/basic/src/model-provider/ollama/ollama-use-tools-or-generate-text-openhermes-example.ts
examples/basic/src/model-provider/llamacpp/llamacpp-use-tools-or-generate-text-openhermes-example.ts
Example prompt format:
examples/basic/src/tool/prompts/open-hermes.ts
for OpenHermes 2.5
- breaking change: Removed
FunctionListToolCallPromptFormat
. Seeexamples/basic/src/model-provide/ollama/ollama-use-tool-mistral-example.ts
for how to implement aToolCallPromptFormat
for your tool.
- breaking change: Rename
Speech
toSpeechGenerator
in facades - breaking change: Rename
Transcription
toTranscriber
in facades
- Anthropic Claude 2.1 support
Introducing model provider facades:
const image = await generateImage(
openai.ImageGenerator({ model: "dall-e-3", size: "1024x1024" }),
"the wicked witch of the west in the style of early 19th century painting"
);
- Model provider facades. You can e.g. use
ollama.TextGenerator(...)
instead ofnew OllamaTextGenerationModel(...)
.
- breaking change: Fixed method name
isParallizable
toisParallelizable
inEmbeddingModel
.
- breaking change: removed
HuggingFaceImageDescriptionModel
. Image description models will be replaced by multi-modal vision models.
- Increase OpenAI chat streaming resilience.
Prompt format and tool calling improvements.
- text prompt format. Use simple text prompts, e.g. with
OpenAIChatModel
:const textStream = await streamText( new OpenAIChatModel({ model: "gpt-3.5-turbo", }).withTextPrompt(), "Write a short story about a robot learning to love." );
.withTextPromptFormat
toLlamaCppTextGenerationModel
for simplified prompt construction:const textStream = await streamText( new LlamaCppTextGenerationModel({ // ... }).withTextPromptFormat(Llama2PromptFormat.text()), "Write a short story about a robot learning to love." );
.asToolCallGenerationModel()
toOllamaTextGenerationModel
to simplify tool calls.
- better error reporting when using exponent backoff retries
- breaking change: removed
input
fromInstructionPrompt
(was Alpaca-specific,AlpacaPromptFormat
still supports it)
Remove section newlines from Llama 2 prompt format.
Ollama edge case and error handling improvements.
Breaking change: the tool calling API has been reworked to support multiple parallel tool calls. This required multiple breaking changes (see below). Check out the updated tools documentation for details.
Tool
hasparameters
andreturnType
schemas (instead ofinputSchema
andoutputSchema
).useTool
usesgenerateToolCall
under the hood. The return value and error handling has changed.useToolOrGenerateText
has been renamed touseToolsOrGenerateText
. It usesgenerateToolCallsOrText
under the hood. The return value and error handling has changed. It can invoke several tools in parallel and returns an array of tool results.- The
maxRetries
parameter inguard
has been replaced by amaxAttempt
parameter.
generateStructureOrText
has been removed.
- Experimental generateToolCallsOrText function for generating a multiple parallel tool call using the OpenAI chat/tools API.
- ChatML prompt format.
- breaking change:
ChatPrompt
structure and terminology has changed to align more closely with OpenAI and similar chat prompts. This is also in preparation for integrating images and function calls results into chat prompts. - breaking change: Prompt formats are namespaced. Use e.g.
Llama2PromptFormat.chat()
instead ofmapChatPromptToLlama2Format()
. See Prompt Format for documentation of the new prompt formats.
- Experimental generateToolCall function for generating a single tool call using the OpenAI chat/tools API.
- Refactored JSON parsing to use abstracted schemas. You can use
parseJSON
andsafeParseJSON
to securely parse JSON objects and optionally type-check them using any schema (e.g. a Zod schema).
- Ollama 0.1.9 support:
format
(for forcing JSON output) andraw
settings - Improved Ollama settings documentation
- Support for fine-tuned OpenAI
gpt-4-0613
models - Support for
trimWhitespace
model setting instreamText
calls
- Image support for
OpenAIChatMessage.user
mapInstructionPromptToBakLLaVA1ForLlamaCppFormat
prompt format
- breaking change:
VisionInstructionPrompt
was replaced by an optionalimage
field inInstructionPrompt
.
- Support for OpenAI vision model.
- Example:
examples/basic/src/model-provider/openai/openai-chat-stream-text-vision-example.ts
- Example:
- Support for OpenAI chat completion
seed
andresponseFormat
options.
- OpenAI speech generation support. Shoutout to @bjsi for the awesome contribution!
- OpenAI
gpt-3.5-turbo-1106
,gpt-4-1106-preview
,gpt-4-vision-preview
chat models. - OpenAI
Dalle-E-3
image model.
- breaking change:
OpenAIImageGenerationModel
requires amodel
parameter.
- Support image input for multi-modal Llama.cpp models (e.g. Llava, Bakllava).
- breaking change: Llama.cpp prompt format has changed to support images. Use
.withTextPrompt()
to get a text prompt format.
- ElevenLabs
eleven_turbo_v2
support.
- breaking change: Uncaught errors were caused by custom Promises. ModelFusion uses only standard Promises. To get full responses from model function, you need to use the
{ returnType: "full" }
option instead of calling.asFullResponse()
on the result.
- ModelFusion server error logging and reporting.
- ModelFusion server creates directory for runs automatically when errors are thrown.
- Support for Cohere v3 embeddings.
- Ollama model provider for text embeddings.
- Llama.cpp embeddings are invoked sequentially to avoid rejection by the server.
- Ollama model provider for text generation and text streaming.
Adding experimental ModelFusion server, flows, and browser utils.
- ModelFusion server (separate export 'modelfusion/server') with a Fastify plugin for running ModelFusion flows on a server.
- ModelFusion flows.
- ModelFusion browser utils (separate export 'modelfusion/browser') for dealing with audio data and invoking ModelFusion flows on the server (
invokeFlow
).
- breaking change:
readEventSource
andreadEventSourceStream
are part of 'modelfusion/browser'.
- Prompt callback option for
streamStructure
- Inline JSDoc comments for the model functions.
- Abort signals and errors during streaming are caught and forwarded correctly.
executeFunction
utility function for tracing execution time, parameters, and result of composite functions and non-ModelFusion functions.
- Streaming results and
AsyncQueue
objects can be used by several consumers. Each consumer will receive all values. This means that you can e.g. forward the same text stream to speech generation and the client.
ElevenLabs improvements.
- ElevenLabs model settings
outputFormat
andoptimizeStreamingLatency
.
- Default ElevenLabs model is
eleven_monolingual_v1
.
parentCallId
event property- Tracing for
useTool
,useToolOrGenerateText
,upsertIntoVectorIndex
, andguard
- breaking change: rename
embedding
event type toembed
- breaking change: rename
image-generation
event type togenerate-image
- breaking change: rename
speech-generation
event type togenerate-speech
- breaking change: rename
speech-streaming
event type tostream-speech
- breaking change: rename
structure-generation
event type togenerate-structure
- breaking change: rename
structure-or-text-generation
event type togenerate-structure-or-text
- breaking change: rename
structure-streaming
event type tostream-structure
- breaking change: rename
text-generation
event type togenerate-text
- breaking change: rename
text-streaming
event type tostream-text
- breaking change: rename
transcription
event type togenerate-transcription
- Speech synthesis streaming supports string inputs.
- Observability for speech synthesis streaming.
- breaking change: split
synthesizeSpeech
intogenerateSpeech
andstreamSpeech
functions - breaking change: renamed
speech-synthesis
event tospeech-generation
- breaking change: renamed
transcribe
togenerateTranscription
- breaking change: renamed
LmntSpeechSynthesisModel
toLmntSpeechModel
- breaking change: renamed
ElevenLabesSpeechSynthesisModel
toElevenLabsSpeechModel
- breaking change: renamed
OpenAITextGenerationModel
toOpenAICompletionModel
- breaking change:
describeImage
model function. UsegenerateText
instead (with e.g.HuggingFaceImageDescriptionModel
).
- Duplex streaming for speech synthesis.
- Elevenlabs duplex streaming support.
- Schema is using data in return type (breaking change for tools).
- Prompt formats for image generation. You can use
.withPromptFormat()
or.withBasicPrompt()
to apply a prompt format to an image generation model.
- breaking change:
generateImage
returns a Buffer with the binary image data instead of a base-64 encoded string. You can call.asBase64Text()
on the response to get a base64 encoded string.
.withChatPrompt()
and.withInstructionPrompt()
shorthand methods.
- Updated Zod to 3.22.4. You need to use Zod 3.22.4 or higher in your project.
- Store runs in AsyncLocalStorage for convienience (Node.js only).
- Guard function.
- Anthropic model support (Claude 2, Claude instant).
breaking change: generics simplification to enable dynamic model usage. Models can be used more easily as function parameters.
output
renamed tovalue
inasFullResponse()
- model settings can no longer be configured as a model options parameter. Use
.withSettings()
instead.
breaking change: moved Pinecone integration into @modelfusion/pinecone
module.
readEventSource
for parsing a server-sent event stream using the JavaScript EventSource.
breaking change: generalization to use Schema instead of Zod.
MemoryVectorIndex.deserialize
requires aSchema
, e.g.new ZodSchema
(from ModelFusion).readEventSourceStream
requires aSchema
.UncheckedJsonSchema[Schema/StructureDefinition]
renamed toUnchecked[Schema/StructureDefinition]
.
breaking change: Generalized embeddings beyond text embedding.
embedText
renamed toembed
.embedTexts
renamed toembedMany
- Removed filtering from
VectorIndexRetriever
query (still available as a setting).
VectorIndexRetriever
supports a filter option that is passed to the vector index.MemoryVectorIndex
supports filter functions that are applied to the objects before calculating the embeddings.
basic-text
logger logs function ids when available.retrieve
produces events for logging and observability.
- Support empty stop sequences when calling OpenAI text and chat models.
- Fixed bugs in
streamStructure
partial JSON parsing.
streamStructure
for streaming structured responses, e.g. from OpenAI function calls. Thanks @bjsi for the input!
- First version of event source utilities:
AsyncQueue
,createEventSourceStream
,readEventSourceStream
.
- Remove resolution part from type definitions.
breaking change: Generalized vector store upsert/retrieve beyond text chunks:
upsertTextChunks
renamed toupsertIntoVectorStore
. Syntax has changed.retrieveTextChunks
renamed toretrieve
SimilarTextChunksFromVectorIndexRetriever
renamed toVectorIndexRetriever
- OpenAI gpt-3.5-turbo-instruct model support.
- Autocomplete for Stability AI models (thanks @Danielwinkelmann!)
- Downgrade Zod version to 3.21.4 because of colinhacks/zod#2697
- breaking change: Renamed chat format construction functions to follow the pattern
map[Chat|Instruction]PromptTo[FORMAT]Format()
, e.g.mapInstructionPromptToAlpacaFormat()
, for easy auto-completion.
- breaking change: The prompts for
generateStructure
andgenerateStructureOrText
have been simplified. You can remove theOpenAIChatPrompt.forStructureCurried
(and similar) parts.
- You can directly pass JSON schemas into
generateStructure
andgenerateStructureOrText
calls without validation usingUncheckedJsonSchemaStructureDefinition
. This is useful when you need more flexility and don't require type inference. Seeexamples/basic/src/util/schema/generate-structure-unchecked-json-schema-example.ts
.
- BREAKING CHANGE: renamed
generateJson
andgenerateJsonOrText
togenerateStructure
andgenerateStructureOrText
. - BREAKING CHANGE: introduced
ZodSchema
andZodStructureDefinition
. These are required forgenerateStructure
andgenerateStructureOrText
calls and in tools. - BREAKING CHANGE: renamed the corresponding methods and objects.
Why this breaking change?
ModelFusion is currently tied to Zod, but there are many other type checking libraries out there, and Zod does not map perfectly to JSON Schema (which is used in OpenAI function calling). Enabling you to use JSON Schema directly in ModelFusion is a first step towards decoupling ModelFusion from Zod. You can also configure your own schema adapters that e.g. use Ajv or another library. Since this change already affected all JSON generation calls and tools, I included other changes that I had planned in the same area (e.g., renaming to generateStructure and making it more consistent).
describeImage
model function for image captioning and OCR. HuggingFace provider available.
- BaseUrlApiConfiguration class for setting up API configurations with custom base URLs and headers.
- Support for running OpenAI on Microsoft Azure.
- Breaking change: Introduce API configuration. This affects setting the baseUrl, throttling, and retries.
- Improved Helicone support via
HeliconeOpenAIApiConfiguration
.
- LMNT speech synthesis support.
- Separated cost calculation from Run.
- Exposed
logitBias
setting for OpenAI chat and text generation models.
- Support for fine-tuned OpenAI models (for the
davinci-002
,babbage-002
, andgpt-3.5-turbo
base models).
- Function logging support.
- Usage information for events.
- Filtering of model settings for events.
- Breaking change: Restructured the function call events.
- Breaking change: Reworked the function observer system. See Function observers for details on how to use the new system.
- Breaking change: Use
.asFullResponse()
to get full responses from model functions (replaces thefullResponse: true
option).
- Support for "babbage-002" and "davinci-002" OpenAI base models.
- Choose correct tokenizer for older OpenAI text models.
- Support for ElevenLabs speech synthesis parameters.
generateSpeech
function to generate speech from text.- ElevenLabs support.
- Introduced unified
stopSequences
andmaxCompletionTokens
properties for all text generation models. Breaking change:maxCompletionTokens
andstopSequences
are part of the base TextGenerationModel. Specific names for these properties in models have been replaced by this, e.g.maxTokens
in OpenAI models ismaxCompletionTokens
.
- Breaking change: Renamed prompt mappings (and related code) to prompt format.
- Improved type inference for WebSearchTool and executeTool.
- JsonTextGenerationModel and InstructionWithSchemaPrompt to support generateJson on text generation models.
- WebSearchTool signature updated.
- Convenience functions to create OpenAI chat messages from tool calls and results.
WebSearchTool
definition to support the SerpAPI tool (separate package:@modelfusion/serpapi-tools
)
executeTool
function that directly executes a single tool and records execution metadata.
- Reworked event system and introduced RunFunctionEvent.
- Breaking change: Model functions return a simple object by default to make the 95% use case easier. You can use the
fullResponse
option to get a richer response object that includes the original model response and metadata.
splitTextChunk
function.
- Breaking change: Restructured text splitter functions.
splitTextChunks
function.- Chat with PDF demo.
- Breaking change: Renamed VectorIndexSimilarTextChunkRetriever to SimilarTextChunksFromVectorIndexRetriever.
- Breaking change: Renamed 'content' property in TextChunk to 'text.
VectorIndexTextChunkStore
- Type inference bug in
trimChatPrompt
.
- HuggingFace text embedding support.
- Helicone observability integration.
- Instruction prompts can contain optional
input
property. - Alpaca instruction prompt mapping.
- Vicuna chat prompt mapping.
- Docs updated to ModelFusion.
- Breaking Change: Renamed to
modelfusion
(fromai-utils.js
).
- Breaking Change: model functions return rich objects that include the result, the model response and metadata. This enables you to access the original model response easily when you need it and also use the metadata outside of runs.
trimChatPrompt()
function to fit chat prompts into the context window and leave enough space for the completion.maxCompletionTokens
property on TextGenerationModels.
- Renamed
withMaxTokens
towithMaxCompletionTokens
on TextGenerationModels.
composeRecentMessagesOpenAIChatPrompt
function (usetrimChatPrompt
instead).
- ChatPrompt concept (with chat prompt mappings for text, OpenAI chat, and Llama 2 prompts).
- Renamed prompt mappings and changed into functions.
- Prompt mapping support for text generation and streaming.
- Added instruction prompt concept and mapping.
- Option to specify context window size for Llama.cpp text generation models.
- Renamed 'maxTokens' to 'contextWindowSize' where applicable.
- Restructured how tokenizers are exposed by text generation models.
- llama.cpp embedding support.
zod
andzod-to-json-schema
are peer dependencies and no longer included in the package.
generateJsonOrText
,useToolOrGenerateText
,useTool
return additional information in the response (e.g. the parameters and additional text).
- Renamed
callTool
touseTool
andcallToolOrGenerateText
touseToolOrGenerateText
.
generateJsonOrText
- Tools:
Tool
class,callTool
,callToolOrGenerateText
- Restructured "generateJson" arguments.
asFunction
model function variants. Use JavaScript lamba functions instead.
- OpenAIChatAutoFunctionPrompt to call the OpenAI functions API with multiple functions in 'auto' mode.
- Changed the prompt format of the generateJson function.
- Reworked interaction with vectors stores. Removed VectorDB, renamed VectorStore to VectorIndex, and introduced upsertTextChunks and retrieveTextChunks functions.
- Bugs related to performance. not being available.
- Llama.cpp tokenization support.
- Split Tokenizer API into BasicTokenizer and FullTokenizer.
- Introduce countTokens function (replacing Tokenizer.countTokens).
- Events for streamText.
- TextDeltaEventSource for Client/Server streaming support.
- End-of-stream bug in Llama.cpp text streaming.
- Streaming support for Cohere text generation models.
- Streaming support for OpenAI text completion models.
- OpenAI function streaming support (in low-level API).
- Generalized text streaming (async string iterable, useful for command line streaming).
- Streaming support for Llama.cpp text generation.
- Llama.cpp text generation support.
- Convert all main methods (e.g.
model.generateText(...)
) to a functional API (i.e.,generateText(model, ...)
).
- JSON generation model.
- Automatic1111 image generation provider.
- Cost calculation for OpenAI image generation and transcription models.
- Cost calculation for Open AI text generation, chat and embedding models.
- Renamed RunContext to Run. Introduced DefaultRun.
- Changed events and observers.
- Updated OpenAI models.
- Low-level support for OpenAI chat functions API (via
OpenAIChatModel.callApi
). - TranscriptionModel and OpenAITranscriptionModel (using
whisper
)
- Single optional parameter for functions/method that contains run, functionId, etc.
- Retry is not attempted when you ran out of OpenAI credits.
- Vercel edge function support (switched to nanoid for unique IDs).
- Improved OpenAI chat streaming API.
- Changed
asFunction
variants from namespaced functions into stand-alone functions.
- Documentation update.
- Major rework of embedding APIs.
- Major rework of text and image generation APIs.
- Various renames.
- Pinecone VectorDB support
- Cohere tokenization support
- OpenAI DALL-E image generation support
generateImage
function- Throttling and retries on model level
- Stability AI image generation support
- Image generation Next.js example
- Updated PDF to tweet example with style transfer
- Hugging Face text generation support
- Memory vector DB
- Cohere embedding API support
- Restructured retry logic
embed
embeds many texts at once
- Cohere text generation support
- OpenAI chat streams can be returned as delta async iterables
- Documentation of integration APIs and models
- OpenAI embedding support
- Text embedding functions
- Chat streams can be returned as ReadableStream or AsyncIterable
- Basic examples under
examples/basic
- Initial documentation available at modelfusion.dev
- Voice recording and transcription Next.js app example.
- OpenAI transcription support (Whisper).
- BabyAGI Example in TypeScript
- TikToken for OpenAI: We've added tiktoken to aid in tokenization and token counting, including those for message and prompt overhead tokens in chat.
- Tokenization-based Recursive Splitter: A new splitter that operates recursively using tokenization.
- Prompt Management Utility: An enhancement to fit recent chat messages into the context window.
- AI Chat Example using Next.js: An example demonstrating AI chat implementation using Next.js.
- PDF to Twitter Thread Example: This shows how a PDF can be converted into a Twitter thread.
- OpenAI Chat Completion Streaming Support: A feature providing real-time response capabilities using OpenAI's chat completion streaming.
- OpenAI Chat and Text Completion Support: This addition enables the software to handle both chat and text completions from OpenAI.
- Retry Management: A feature to enhance resilience by managing retry attempts for tasks.
- Task Progress Reporting and Abort Signals: This allows users to track the progress of tasks and gives the ability to abort tasks when needed.
- Recursive Character Splitter: A feature to split text into characters recursively for more detailed text analysis.
- Recursive Text Mapping: This enables recursive mapping of text, beneficial for tasks like summarization or extraction.
- Split-Map-Filter-Reduce for Text Processing: A process chain developed for sophisticated text handling, allowing operations to split, map, filter, and reduce text data.