Architecture
This document describes the current implementation and highlights tradeoffs, assumptions, and known constraints.
Entry points
Section titled “Entry points”WXT auto-discovers entry points under src/entrypoints/. Each entry is a thin shell that delegates to a feature module elsewhere in src/, so the WXT-facing surface stays small and the actual logic lives where the rest of the code can import it.
| WXT entry point | Output type | Delegates to |
|---|---|---|
src/entrypoints/background.ts | service worker | src/background/index.ts |
src/entrypoints/sidepanel/index.tsx | extension page | src/sidepanel/index.tsx (React root) |
src/entrypoints/options/index.tsx | extension page | src/options/index.tsx (React root) |
src/entrypoints/print/main.ts | extension page | self-contained (print-to-PDF helper) |
src/entrypoints/content.ts | content script (all URLs) | src/contents/index.ts (lazy-imported) |
src/entrypoints/selection-button.content.tsx | content script (selection overlay) | self-contained (shadow-DOM UI) |
The WXT shells are intentionally minimal — background.ts is a 4-line import, content.ts is a 6-line lazy-import. Real work lives in the feature modules:
src/background/— handler dispatch, provider streaming orchestration,onInstalledmigrationssrc/sidepanel/— chat surface React app, opens the runtime portsrc/options/— settings React appsrc/contents/— selection capture, page extraction helpers, URL filtering
System responsibilities
Section titled “System responsibilities”Sidepanel
- Chat interaction UX
- Session display and branch navigation
- Streaming state updates
- Local chat actions (edit, fork, delete, export)
Options
- Provider configuration
- Model parameters
- Embedding / RAG configuration
- Feature toggles and diagnostics
Background worker
- Provider resolution and streaming orchestration
- Model management handlers
- Embedding generation handlers for file chunks
- Browser-level APIs (DNR / CORS rules, context menu)
Content scripts
- Selected-text capture
- Page extraction entrypoints for browser-context workflows
Data flow
Section titled “Data flow”- User sends a prompt in the sidepanel.
- UI opens a runtime port (
MESSAGE_KEYS.PROVIDER.STREAM_RESPONSE) to the background. - Background receives
CHAT_WITH_MODELand resolves the provider using the model mapping. - Provider starts streaming tokens back to the background.
- Background relays chunks to the UI through port messages.
- UI applies optimistic updates and persists completed messages in the local chat store.
- Optional embedding pipelines index chat / file content for retrieval.
flowchart TD
A["Sidepanel UI (React)"] --> B["Runtime Port (STREAM_RESPONSE)"]
B --> C["Background Worker"]
C --> D["ProviderFactory resolve by model mapping"]
D --> E["Ollama"]
D --> F["LM Studio"]
D --> G["llama.cpp"]
E --> H["Chunk Stream"]
F --> H
G --> H
H --> I["UI Stream State Update"]
I --> J["SQLite Chat Store"]
I --> K["Optional RAG Pipeline"]
K --> L["Embedding Strategy Chain"]
L --> M["Vector store"]
Model capabilities
Section titled “Model capabilities”Feature availability is resolved per selected model. The capability layer covers text chat, image input, tool calling, reasoning output, embeddings, and context length. The resolver prefers explicit user overrides, then provider/model metadata, then provider defaults.
This keeps capability-sensitive UI from guessing. For example:
- Image attach is enabled only for models that resolve
vision: true. - Internal tools are offered only to models that resolve
toolCalling: true. - Embedding models are kept out of normal chat-model selectors.
Users can override capabilities from the model menu when a provider cannot report them reliably.
Tool calling architecture
Section titled “Tool calling architecture”Tool calling is handled in the background worker, between provider streaming and UI persistence.
- The handler resolves the selected model and checks whether tool calling is enabled for that model.
- Tool definitions are loaded from the
ToolRegistry. - The provider receives the chat request with native tool definitions.
- If the model requests a tool, the stream loop executes it locally and appends a tool result to the working provider history.
- The loop continues until the model returns a normal answer or hits the iteration cap.
Current internal tools:
| Tool | Purpose |
|---|---|
current_tab | Read the active tab’s extracted text, including supported video transcripts. |
list_tabs | List readable open tabs with current ids, titles, and URLs. |
read_tab | Read a specific open tab by id or title/URL query. Stale ids are refreshed and can fall back to the active readable tab. |
selected_text | Use the most recent page selection captured by the extension. |
file_search | Search uploaded/indexed files. |
rag_search | Search local chat memory / indexed conversation context. |
web_search | Search the live web through the configured search provider. |
Tool results are trimmed before they are fed back to the model. The UI persists the final assistant answer and trace metadata, not the intermediate tool messages.
Web search adapter seam
Section titled “Web search adapter seam”Web search is intentionally provider-agnostic at the model boundary. The model sees only web_search({ query, count? }); the backend is resolved from device-local settings at runtime.
Implementation paths:
src/lib/tools/web-search/types.tsdefinesWebSearchBackend,WebSearchProviderConfig, and normalizedWebSearchResult.src/lib/tools/web-search/backends/contains provider adapters for SearXNG, Brave Search, and Tavily.src/lib/tools/web-search/registry.tsis the backend registry/factory seam.src/lib/tools/web-search/web-search-tool-source.tsexposes the tool only when enabled and valid.src/features/web-search/owns the settings UI and chat-toolbar toggle.
Provider behavior:
| Provider | Request shape | Result cap behavior |
|---|---|---|
| SearXNG | GET /search?q=...&format=json&pageno=N&safesearch=... | No API-side count; fetch configured pages, de-dupe, then slice locally. |
| Brave Search | GET https://api.search.brave.com/res/v1/web/search with X-Subscription-Token | Sends count. |
| Tavily | POST https://api.tavily.com/search with bearer auth | Sends max_results. |
References:
Search titles and snippets are untrusted data. The tool strips HTML, caps snippets and total output, keeps API keys out of logs, and asks the model to cite returned URLs for current facts.
Image input
Section titled “Image input”Images are stored as chat attachments and routed only when the selected model supports vision. Provider adapters translate the same chat message into each provider’s expected wire format:
- Ollama receives base64 image payloads through its native
imagesfield. - OpenAI-compatible providers receive
image_urlcontent parts.
Images reuse the existing file metadata path for local persistence and preview display, so no separate image-history store is needed.
Model selection and provider routing
Section titled “Model selection and provider routing”- The selected model key is persisted under the provider key path (
STORAGE_KEYS.PROVIDER.SELECTED_MODEL) with legacy reads. - The model list is built by querying all enabled providers in
useProviderModels. - Provider configs are persisted via
ProviderManager(ProviderStorageKey.CONFIG). - Default profiles: Ollama, LM Studio, llama.cpp, vLLM, KoboldCPP, and LocalAI.
- Per-model provider routing is stored via
ProviderStorageKey.MODEL_MAPPINGS. - Background routing is performed by
ProviderFactory.getProviderForModel(modelId).
Streaming architecture
Section titled “Streaming architecture”Streaming occurs over extension runtime ports:
- UI hook —
src/features/chat/hooks/use-chat-stream.ts - Background handler —
src/background/handlers/handle-chat-with-model.ts - Cancel handling —
abort-controller-registry
Runtime ports support continuous chunk delivery better than one-shot messages, and cancellation is clean via AbortController scoped to active stream keys. Tradeoff: message keys are provider-named (PROVIDER.*) with legacy OLLAMA.* compatibility.
Storage architecture
Section titled “Storage architecture”- Chat / sessions / messages / files: SQL WASM (
sql.js) persisted to IndexedDB. The facadesrc/lib/repositories/chat-history.tsis the single entry point and now routes to SQLite only. - Vectors / embeddings: still on Dexie + IndexedDB via
src/lib/embeddings/storage.ts. Not yet migrated to SQLite. - Settings / provider config:
@plasmohq/storagevia theplasmoGlobalStoragewrapper. Sync-safe settings usechrome.storage.sync; device-local keys usechrome.storage.local. - Export / restore: ZIP bundles with versioned manifests; includes the chat SQLite blob plus Dexie dumps for vector embeddings and knowledge sets.
Chat-history storage
Section titled “Chat-history storage”The facade exposes one chat-history API while the implementation stays SQLite-only. Three guarantees follow:
- Durability: SQLite writes are debounced 1s to IndexedDB, and explicit reset/export/unload paths force-flush via
flushSave()where needed. - Single source: chat sessions, messages, branches, and file metadata read and write through one normalized SQLite schema.
- Export path: full-data export includes the SQLite database blob, so chat history remains restorable without any Dexie chat dump.
See the API reference for the full surface.
RAG / embedding architecture
Section titled “RAG / embedding architecture”- Embeddings are generated via a browser-safe strategy chain.
- Content is chunked and indexed locally; chat history uses SQLite, while vector storage remains in IndexedDB via the embeddings storage layer.
- Query-time retrieval uses hybrid search with adaptive weighting.
- The pipeline includes diversity filtering and recency / feedback score hooks.
- Embeddings use a fallback chain: provider-native → shared model → background warmup → Ollama fallback.
- Background model preparation uses provider capabilities where available; Ollama remains the most complete management path.
Why a background worker
Section titled “Why a background worker”- Keeps provider network I/O and long-running operations off the UI thread.
- Centralizes extension APIs that are unavailable or unsafe in UI contexts.
- Simplifies cancellation and stream lifecycle tracking.
Tradeoffs and decisions
Section titled “Tradeoffs and decisions”Legacy naming retained for compatibility
- Pro: avoids migration breakage.
- Con: causes confusion in multi-provider code paths.
SQLite-only chat history
- Pro: one normalized chat store, smaller bundle surface, simpler boot path, and clearer export semantics.
- Con: rollback now depends on full-data export or browser-level IndexedDB recovery, not a live Dexie chat fallback.
Provider-agnostic chat with provider-specific management features
- Pro: fast rollout of multi-provider chat.
- Con: uneven feature parity — pull / delete / version are Ollama-centric.
Local retrieval pipeline over extension constraints
- Pro: privacy-preserving retrieval.
- Con: CSP / performance limits prevent full in-browser model / reranker parity.
Assumptions and constraints
Section titled “Assumptions and constraints”Assumptions
- The user can run at least one provider endpoint.
- Endpoint URLs are reachable from extension context.
- Local resources are sufficient for selected models.
Constraints
- Chrome extension CSP limits some WASM / worker ML paths.
- Firefox lacks Chrome DNR API behavior.
- Provider model-naming collisions can cause ambiguous mapping behavior.
Known risks and technical debt
Section titled “Known risks and technical debt”- Legacy
ollama-*keys retained for compatibility while provider naming becomes default. - Partial provider parity in model-management actions.
- Dual persistence architecture during the migration period.
- Retrieval quality depends on chunking / threshold tuning and model quality.
Desktop design notes
Section titled “Desktop design notes”These are non-implementation notes for a hypothetical desktop port.
- The provider abstraction (factory / manager / types) is intentionally runtime-agnostic and can be reused in a desktop app.
- Provider identity metadata (icons, display names) should remain shared via
src/lib/providers/registry.ts. - Browser-only APIs (DNR, extension messaging) are already isolated in background handlers and would map to Electron main-process equivalents.
- Storage keys are provider-agnostic with legacy shims; a desktop app can reuse the same keys to migrate settings.
Near-term priorities
Section titled “Near-term priorities”- Finish retiring legacy
ollama-*naming where compatibility does not require it. - Migrate vector storage and knowledge sets off Dexie when the SQLite path is ready.
- Expand provider parity for management actions.
- Improve retrieval observability and failure diagnostics.