BYOLLM (Bring Your Own LLM)
Let users plug their own LLM provider into your extension
BYOLLM = users provide their own LLM credentials and your extension uses them. Useful when:
- The user has a private model deployment (Anthropic enterprise, local Ollama, OpenAI org).
- Compliance requires data not flow through Imperal's default LLMs.
- The user wants to control their own LLM costs.
How it works at the platform level
User configures their LLM in panel.imperal.io/settings/llm:
├─ provider: anthropic | openai | local-ollama
├─ model: claude-sonnet-4-6 | gpt-4.1 | qwen3.5:27b
├─ api_key: <encrypted, stored in auth-gateway secret store>
└─ base_url: https://api.anthropic.com (or custom)
Per chat turn:
Auth Gateway resolves user's BYOLLM config (if set)
↓
Web-kernel cascade:
1. Use BYOLLM provider/model/api_key/base_url
2. Inherit admin per-purpose AI params (temperature, top_p, max_tokens) from platform LLM config
3. Cost = $0 to billing (BYOLLM excluded — Policy A)The platform handles credential storage, rotation, decryption. Your extension just receives an ctx.llm accessor that already knows which provider to call.
When to use BYOLLM in your extension
You'd reach for BYOLLM if your extension does its own LLM calls beyond what @chat.function offers — for example:
Long-form content generation
Where the user asks the extension itself to draft something — not the [web-kernel](/en/reference/glossary/) router.
Semantic search / RAG
Embedding queries, vector lookups, summarization of fetched docs.
Internal agentic loops
When your extension orchestrates a sub-LLM for a complex task (with appropriate guardrails).
Most extensions don't need BYOLLM — @chat.function is enough.
The pattern
from imperal_sdk import ChatExtension, ActionResult
from pydantic import BaseModel, Field
class SummarizeParams(BaseModel):
text: str = Field(description="The content to summarize.")
max_words: int = Field(80, description="Target summary length.")
@chat.function(
description="Summarize a piece of text using the user's configured LLM.",
action_type="read",
)
async def summarize(ctx, params: SummarizeParams):
# ctx.llm is auto-resolved per user (BYOLLM if configured, platform LLM otherwise)
response = await ctx.llm.create_message(
system="You are a concise summarizer. Output plain text only.",
messages=[
{"role": "user", "content": f"Summarize in {params.max_words} words:\n\n{params.text}"},
],
max_tokens=200,
)
return {"text": response.text}What ctx.llm gives you
Prop
Type
Federal guarantees
Credentials never reach your code
ctx.llm makes the call via [auth gateway](/en/reference/glossary/). Your extension never sees the user's API key in plaintext.
BYOLLM is excluded from billing (Policy A)
Your extension's BYOLLM calls don't bill the user. Platform LLM calls do.
Tenant isolation
One user's BYOLLM config can't be used to make calls on behalf of another.
Audit chokepoint
Every LLM call (BYOLLM or platform) logs to the federal [action ledger](/en/reference/glossary/) with provider/model/token-count metadata.
Per-purpose configuration
The platform's LCU (LLM Config Unification) layer means your BYOLLM call inherits admin per-purpose settings:
Admin sets: chain_narrator.temperature = 0.4, chain_narrator.max_tokens = 800
↓
Your ctx.llm.create_message(purpose="chain_narrator", ...) automatically uses temp=0.4, max_tokens=800
↓
The LLM provider (BYOLLM or platform) honours those paramsYou can override per-call if needed, but the cascade is your default.
What if the user hasn't configured BYOLLM?
async def summarize(ctx, params):
# ctx.llm is ALWAYS available — falls back to platform LLM if BYOLLM not configured
if ctx.llm.is_byollm:
ctx.log.info("Using user's own LLM", provider=ctx.llm.provider)
response = await ctx.llm.create_message(...)
return {"text": response.text}ctx.llm always works. is_byollm lets you branch (e.g. show different cost UX) but you're not required to.