BYOLLM (Bring Your Own LLM)

BYOLLM = users provide their own LLM credentials and your extension uses them. Useful when:

The user has a private model deployment (Anthropic enterprise, local Ollama, OpenAI org).
Compliance requires data not flow through Imperal's default LLMs.
The user wants to control their own LLM costs.

How it works at the platform level

User configures their LLM in panel.imperal.io/settings/llm:
  ├─ provider:  anthropic | openai | local-ollama
  ├─ model:     claude-sonnet-4-6 | gpt-4.1 | qwen3.5:27b
  ├─ api_key:   <encrypted, stored in auth-gateway secret store>
  └─ base_url:  https://api.anthropic.com (or custom)

Per chat turn:
  Auth Gateway resolves user's BYOLLM config (if set)
       ↓
  Web-kernel cascade:
    1. Use BYOLLM provider/model/api_key/base_url
    2. Inherit admin per-purpose AI params (temperature, top_p, max_tokens) from platform LLM config
    3. Cost = $0 to billing (BYOLLM excluded — Policy A)

The platform handles credential storage, rotation, decryption. Your extension just receives an ctx.llm accessor that already knows which provider to call.

When to use BYOLLM in your extension

You'd reach for BYOLLM if your extension does its own LLM calls beyond what @chat.function offers — for example:

📝

Long-form content generation

Where the user asks the extension itself to draft something — not the [web-kernel](/en/reference/glossary/) router.

🔍

Semantic search / RAG

Embedding queries, vector lookups, summarization of fetched docs.

🤖

Internal agentic loops

When your extension orchestrates a sub-LLM for a complex task (with appropriate guardrails).

Most extensions don't need BYOLLM — @chat.function is enough.

The pattern

from imperal_sdk import ChatExtension, ActionResult
from pydantic import BaseModel, Field

class SummarizeParams(BaseModel):
    text: str = Field(description="The content to summarize.")
    max_words: int = Field(80, description="Target summary length.")

@chat.function(
    description="Summarize a piece of text using the user's configured LLM.",
    action_type="read",
)
async def summarize(ctx, params: SummarizeParams):
    # ctx.llm is auto-resolved per user (BYOLLM if configured, platform LLM otherwise)
    response = await ctx.llm.create_message(
        system="You are a concise summarizer. Output plain text only.",
        messages=[
            {"role": "user", "content": f"Summarize in {params.max_words} words:\n\n{params.text}"},
        ],
        max_tokens=200,
    )
    return {"text": response.text}

What `ctx.llm` gives you

Prop

Type

Federal guarantees

🔐

Credentials never reach your code

ctx.llm makes the call via [auth gateway](/en/reference/glossary/). Your extension never sees the user's API key in plaintext.

📊

BYOLLM is excluded from billing (Policy A)

Your extension's BYOLLM calls don't bill the user. Platform LLM calls do.

⚖️

Tenant isolation

One user's BYOLLM config can't be used to make calls on behalf of another.

📝

Audit chokepoint

Every LLM call (BYOLLM or platform) logs to the federal [action ledger](/en/reference/glossary/) with provider/model/token-count metadata.

Per-purpose configuration

The platform's LCU (LLM Config Unification) layer means your BYOLLM call inherits admin per-purpose settings:

Admin sets: chain_narrator.temperature = 0.4, chain_narrator.max_tokens = 800
       ↓
Your ctx.llm.create_message(purpose="chain_narrator", ...) automatically uses temp=0.4, max_tokens=800
       ↓
The LLM provider (BYOLLM or platform) honours those params

You can override per-call if needed, but the cascade is your default.

What if the user hasn't configured BYOLLM?

async def summarize(ctx, params):
    # ctx.llm is ALWAYS available — falls back to platform LLM if BYOLLM not configured
    if ctx.llm.is_byollm:
        ctx.log.info("Using user's own LLM", provider=ctx.llm.provider)
    response = await ctx.llm.create_message(...)
    return {"text": response.text}

ctx.llm always works. is_byollm lets you branch (e.g. show different cost UX) but you're not required to.

BYOLLM (Bring Your Own LLM)

How it works at the platform level

When to use BYOLLM in your extension

Long-form content generation

Semantic search / RAG

Internal agentic loops

The pattern

What `ctx.llm` gives you

Federal guarantees

Credentials never reach your code

BYOLLM is excluded from billing (Policy A)

Tenant isolation

Audit chokepoint

Per-purpose configuration

What if the user hasn't configured BYOLLM?

Anti-patterns

What's next

Audit & security

Building extensions

[Pydantic feedback loop](/en/reference/glossary/)

On this page

BYOLLM (Bring Your Own LLM)

Long-form content generation

Semantic search / RAG

Internal agentic loops

Credentials never reach your code

BYOLLM is excluded from billing (Policy A)

Tenant isolation

Audit chokepoint

Don't import anthropic / openai SDKs directly

Don't read user.attributes for credentials

Don't loop unbounded

Audit & security

Building extensions

[Pydantic feedback loop](/en/reference/glossary/)

On this page