Skip to main content

OpenAI Compatible Chat

Connect your AI agents to any LLM that exposes an OpenAI-compatible chat API, including self-hosted models like Ollama and vLLM.

Overview

The OpenAI Compatible connector allows you to use any LLM provider that implements the OpenAI Chat Completions API format. This includes self-hosted solutions like Ollama and vLLM, as well as cloud services that offer OpenAI-compatible endpoints.

The connector supports:

  • Chat completions with streaming
  • Function calling (tool use)

Setting up the connector

To add an OpenAI Compatible connector, complete the following steps:

  1. Navigate to the Squid Console and select your application.
  2. Click the Connectors tab.
  3. Click Available Connectors and find the OpenAI Compatible Chat connector. Then click Add Connector.
  4. Provide the following details:
  • Connector ID: A unique ID of your choice (e.g., my-ollama). This is the integrationId you will reference in code.
  • Base URL: The publicly accessible URL of the OpenAI-compatible API. The Squid backend must be able to reach this URL, so it cannot be a localhost address unless you are developing locally.
  • API Key (optional): An API key for authentication. Some providers, such as local Ollama instances, do not require an API key.
  • Models: A JSON array defining the models available through this connector. Each model requires the following fields:
FieldTypeDescription
modelNamestringThe model identifier used in API calls
displayNamestringA human-readable name for the model
maxOutputTokensnumberMaximum number of tokens the model can generate in a response
contextWindowTokensnumberTotal context window size in tokens

Example:

[
{
"modelName": "llama3",
"displayName": "Llama 3",
"maxOutputTokens": 4096,
"contextWindowTokens": 8192
}
]
  1. Click Add Connector.

Using the connector

Once configured, use the connector with an AI agent by specifying the connector ID and model name:

Client code
await squid.ai().agent('my-agent').updateModel({
integrationId: 'my-ollama',
model: 'llama3',
});

You can also override the model on a per-request basis:

Client code
const response = await squid
.ai()
.agent('my-agent')
.ask('Hello!', {
model: {
integrationId: 'my-ollama',
model: 'llama3',
},
});

Common configurations

Ollama

Ollama runs open-source models locally.

  • Base URL: The publicly accessible URL of your Ollama instance (e.g., https://ollama.your-domain.com)
  • API Key: Not required
  • Models: Depends on which models you have pulled locally (e.g., llama3, mistral, codellama)

vLLM

vLLM is a high-throughput inference engine with an OpenAI-compatible server.

  • Base URL: The publicly accessible URL of your vLLM server (e.g., https://vllm.your-domain.com)
  • API Key: Depends on your vLLM configuration
  • Models: The model(s) you started vLLM with