Provider Setup

Verified built-ins are Ollama, LM Studio, and llama.cpp. Add vLLM, LocalAI, KoboldCPP, or another compatible endpoint through Add provider. The OpenRouter preset uses its OpenAI-compatible Chat Completions API. Anthropic uses the native Messages API, while the generic Anthropic-compatible option also supports keyless local or LAN endpoints.

Web search is configured separately in the Context tab. Use SearXNG for local/self-hosted search, or Brave Search/Tavily when you want an API-backed provider.

1. Install the extension

Install Ollama Client from the Chrome Web Store.

2. Pick a provider

Provider	Default endpoint	Notes
Ollama	`http://localhost:11434`	Recommended baseline. Tool calling plus fullest model-management support.
LM Studio	`http://localhost:1234/v1`	OpenAI-compatible chat, embeddings, tool calling, and LM Studio model discovery.
llama.cpp server	`http://localhost:8000/v1`	OpenAI-compatible. Run with `llama-server`.
OpenAI-compatible	User configured	Add vLLM, LocalAI, KoboldCPP, or another compatible endpoint.
OpenAI	`https://api.openai.com/v1`	Hosted OpenAI API; API key required. Uses streamed usage and the modern completion-token field.
Anthropic	`https://api.anthropic.com/v1`	Remote Claude Messages API; API key required.
Anthropic-compatible	User configured	Native Messages wire; API key is optional for compatible self-hosted endpoints.
OpenRouter	`https://openrouter.ai/api/v1`	OpenAI-compatible hosted gateway; API key required. Model IDs keep their provider prefix.

3. Start Ollama (primary path)

Install Ollama from ollama.com, then start it:

ollama serve

Pull at least one chat model:

ollama pull qwen2.5:3b

For tool calling and image input, choose a model that actually supports those capabilities. The extension detects reported capabilities where providers expose them, and lets you override them from the model menu when a provider cannot report them.

Pull one embeddings model for RAG:

ollama pull all-minilm:latest

You need at least one chat model and one embeddings model installed for the full experience.

4. Configure the extension

Open the extension’s options page.
Go to the Providers tab.
Enable the providers you want.
Set the base URL and run a connection test.
Pick a model from the chat model menu.

5. Verify endpoints

# Ollama
curl http://localhost:11434/api/tags

# LM Studio
curl http://localhost:1234/v1/models

# llama.cpp
curl http://localhost:8000/v1/models

6. Reality checks

Chat generation is fully provider-agnostic.
Image input is model-dependent. If the selected model is not vision-capable, the composer blocks image attach instead of sending unsupported input.
Tool calling is model-dependent. Ollama and LM Studio both expose tool-calling APIs, but the selected model still needs tool-use support. Tool-capable models can inspect browser context through local extension tools; non-tool models keep the old plain chat path.
Web search is off by default and model-visible only as web_search. Backend choice is a user setting, not a model prompt detail.
Model-management actions depend on provider capabilities. Ollama has the fullest support; LM Studio adds pull/unload support.
Embedding generation uses the configured provider when supported, then falls back through the shared embedding path and Ollama for reliability.

7. Optional local web search with SearXNG

The repo includes a local SearXNG compose stack for private web-search testing.

cd searxng
docker compose up -d

Then open Settings -> Context -> Web Search:

Enable web search.
Pick SearXNG.
Set endpoint to http://localhost:8080.
Run Test search.

SearXNG supports pageno, not an API-side result-count parameter. Ollama Client can fetch 1-3 pages, de-dupe URLs, then apply the configured result-count cap before returning results to the model.

8. Search provider API references

9. CORS and browser notes

Chrome-based browsers route extension requests through Declarative Net Request (DNR). Firefox uses a different extension API model.

10. Troubleshooting

Confirm the provider process is actually running.
Confirm the endpoint URL matches the runtime URL exactly (port, scheme, /v1 suffix).
Use the Test connection button in Providers settings before debugging model behavior.
For web search, use Test search in Context settings and verify your SearXNG endpoint or API key.
Check the background console (chrome://extensions → service worker) for streaming or provider errors.