Recipe — BYOLLM-aware extension

An extension that makes its own LLM calls — using the user's BYOLLM provider when set

This recipe shows an extension that calls an LLM itself beyond the web-kernel's intent classifier — for content generation inside the handler. Federal-clean BYOLLM with platform-LLM fallback.

Use case

A "rewrite my message" tool — the user pastes text, picks a tone, the extension uses the user's BYOLLM (or platform LLM) to rewrite.

schemas.py

from pydantic import BaseModel, Field
from typing import Literal

class RewriteParams(BaseModel):
    text: str = Field(description="The text to rewrite.")
    tone: Literal["formal", "casual", "concise", "friendly", "apologetic"] = Field(
        description="Target tone.",
    )
    keep_meaning: bool = Field(
        True,
        description="Preserve the original meaning. Set false if the user wants the AI to also restructure ideas.",
    )

handlers_chat.py

from imperal_sdk import ChatExtension, ActionResult
from .schemas import RewriteParams

TONE_PROMPTS = {
    "formal": "Rewrite the following text in a formal, professional tone.",
    "casual": "Rewrite the following text in a casual, conversational tone.",
    "concise": "Rewrite the following text to be as concise as possible while keeping every important detail.",
    "friendly": "Rewrite the following text to feel warm and friendly.",
    "apologetic": "Rewrite the following text to be apologetic and humble.",
}

@chat.function(
    description="Rewrite a piece of text in a specified tone using the user's configured LLM.",
    action_type="read",
    effects=["llm.text-generation"],
)
async def rewrite(ctx, params: RewriteParams):
    system = TONE_PROMPTS[params.tone]
    if params.keep_meaning:
        system += " Preserve all factual content. Don't add or remove information."
    response = await ctx.llm.create_message(
        system=system,
        messages=[{"role": "user", "content": params.text}],
        max_tokens=1500,
        purpose="content_rewrite",   # cascades admin per-purpose settings
    )
    return {
        "text": response.text,
        "provider": ctx.llm.provider,
        "model": ctx.llm.model,
        "is_byollm": ctx.llm.is_byollm,
    }

How `ctx.llm` resolves

ctx.llm is built per-call by the web-kernel:
  ├─ If user has BYOLLM set: use their provider/model/api_key/base_url
  ├─ If admin per-purpose settings exist for "content_rewrite": apply them
  ├─ Otherwise: fall back to platform-default LLM (Sonnet 4.6)

Your code is provider-agnostic. The same ctx.llm.create_message(...) call works for Anthropic, OpenAI, or local Ollama on the user's side.

What you NEVER do

❌

Don't import the anthropic / openai SDK

V7 forbids it (direct provider SDK imports). ctx.llm is the federal-clean surface.

❌

Don't read user.attributes for credentials

API keys never land in attributes. Decryption happens at [auth gateway](/en/reference/glossary/).

❌

Don't loop unbounded

Bound your own LLM loops. The platform won't enforce it for in-handler loops — but cost reviews catch unbounded calls.

Try it in chat

"rewrite this casually: 'Per our previous conversation, attached please find the document for your perusal.'"

The classifier picks rewrite(text=..., tone="casual"). Your handler calls ctx.llm.create_message, gets back something like "Hey — here's the doc you wanted, take a look when you get a chance."

Cost considerations

# Show user which LLM and whether BYOLLM
return {
    "text": rewritten_text,
    "provider": ctx.llm.provider,
    "model": ctx.llm.model,
    "is_byollm": ctx.llm.is_byollm,
}

If is_byollm=True, the call doesn't bill the user (Policy A — BYOLLM excluded from billing). The Imperal Panel renders a small BYOLLM badge so the user sees it.

Variations

class RewriteParams(BaseModel):
    text: str = Field(description="Text to rewrite.")
    tone: Literal["formal", "casual"] = Field(description="Tone.")
    n_variations: int = Field(3, description="Number of rewrites to generate.")

async def rewrite_multiple(ctx, params):
    results = []
    for i in range(params.n_variations):
        r = await ctx.llm.create_message(
            system=f"Rewrite in {params.tone} tone, variation {i+1}.",
            messages=[{"role": "user", "content": params.text}],
        )
        results.append(r.text)
    return {"variations": results}

async def extract_action_items(ctx, params):
    response = await ctx.llm.create_message(
        system="Extract action items as JSON: {items: [{owner, deadline, task}]}",
        messages=[{"role": "user", "content": params.text}],
        response_format="json",
    )
    return {"action_items": response.json}

For UI-driven streaming (typewriter effect), use ctx.llm.stream_message and yield chunks back via the chat surface. This is rarer for chat.functions — typically panels handle stream UX.

Recipe — BYOLLM-aware extension

Use case

How `ctx.llm` resolves

What you NEVER do

Don't import the anthropic / openai SDK

Don't read user.attributes for credentials

Don't loop unbounded

Try it in chat

Cost considerations

Variations

Where to next

BYOLLM guide

[Pydantic feedback loop](/en/reference/glossary/)

Building extensions guide

On this page

Recipe — BYOLLM-aware extension

Use case

How ctx.llm resolves

What you NEVER do

Don't import the anthropic / openai SDK

Don't read user.attributes for credentials

Don't loop unbounded

Try it in chat

Cost considerations

Variations

Where to next

BYOLLM guide

[Pydantic feedback loop](/en/reference/glossary/)

Building extensions guide

On this page

How `ctx.llm` resolves