Performance

How extension code affects user-perceived latency — what's hot, what's cached, what's free, what to never do

Extension handlers do not run in isolation. Each invocation is part of a larger web-kernel dispatch cycle: intent classification, workflow routing, activity execution, and SSE delivery. Understanding where your code sits in that cycle — and what costs what — is the foundation for writing extensions that feel fast.

Topic	Section
Latency budget	Budget
Hot paths	Hot paths
Refresh model	Refresh model
Free operations	Free
Expensive operations	Expensive
Caching strategies	Caching
Skeleton patterns	Skeletons
Panel patterns	Panels
Chain patterns	Chains
Storage tier choice	Storage
Observability	Observability
Common pitfalls	Pitfalls
Cross-references	See also

Latency budget

A typical chat-turn has a target range of 80–300 ms for the handler portion. The end-to-end latency the user perceives is higher — intent classification, workflow scheduling, and LLM inference all precede your handler — but those costs are outside your control. Your extension handler is a slice of a larger budget.

Where time goes in a complete turn:

Stage	Typical cost	Owner
Intent classification (LLM)	400–1 500 ms	Web-kernel
Workflow scheduling	10–30 ms	Web-kernel
Extension activity dispatch	5–20 ms	Web-kernel
Your handler execution	20–300 ms	You
SSE / HTTP delivery	5–20 ms	Web-kernel

The handler portion is where your choices have the most impact. A handler that makes five sequential HTTP calls easily turns a 200 ms budget into a 1 500 ms experience.

For chain turns — where two or more tools execute in sequence — each step's handler latency multiplies. A 300 ms handler becomes 900 ms across three steps.

Hot paths vs cold paths

Not all handlers execute with the same frequency or latency expectations. Categorize yours before optimizing.

Panels — hot, frequent, should be cheap

Panel handlers execute whenever the frontend re-fetches panel content. That happens:

On auto_action load (user opens the panel tab)
When ActionResult.refresh_panels names the panel
On on_event: SSE events matching the panel's refresh declaration

Target: < 100 ms p50. The user is sitting in front of the UI, waiting. Every panel re-render that takes 500 ms is perceptible friction.

A sidebar panel that renders folders, stats, and a note list — like the notes extension — separates its data sources:

Folders: cached 60 s — ctx.cache.get_or_fetch
Folder stats: cached 30 s — ctx.cache.get_or_fetch
Notes list: not cached — primary content, always fresh

The folders and stats are structurally stable data; caching them is correct. The notes list must be fresh; that single live HTTP call is the dominant cost in the panel handler. Two cached calls + one live call is substantially cheaper than three live calls.

Skeletons — background, periodic, should be thorough

Skeleton handlers run in the background on their configured TTL tick, not during a user-visible request. They are not on the hot path for user-perceived latency. You can afford more work — multiple HTTP calls, aggregation — because the result is pre-computed and served from Redis when the classifier needs it.

But: skeleton handlers that perform expensive work on a short TTL amplify that cost across the user population. A skeleton with ttl=10 doing a 500 ms aggregation runs six times per minute per active user.

skeleton.py

from imperal_sdk import Extension

ext = Extension(
    "tasks-ext",
    display_name="Tasks",
    description="Example showing skeleton TTL choice for frequently-updated data.",
    actions_explicit=True,
)


# ttl=30: short because the LLM must see fresh task counts soon after writes.
# This works because the skeleton only surfaces counters + recent IDs — no
# expensive joins or aggregation.
@ext.skeleton(
    "tasks",
    alert=True,
    ttl=30,
    description="Today/overdue/upcoming counts and recent task IDs.",
)
async def skeleton_refresh_tasks(ctx) -> dict:
    # Fan-out with asyncio.gather to parallelize the HTTP calls.
    # All five calls execute concurrently — total wall time ≈ slowest call.
    import asyncio
    try:
        today_raw, overdue_raw = await asyncio.gather(
            ctx.http.get("/v1/tasks", params={"filter": "due_today"}),
            ctx.http.get("/v1/tasks", params={"filter": "overdue"}),
        )
        return {
            "response": {
                "today_count": today_raw.json().get("total", 0),
                "overdue_count": overdue_raw.json().get("total", 0),
            }
        }
    except Exception:
        return {"response": {"today_count": 0, "overdue_count": 0}}

skeleton_slow.py

from imperal_sdk import Extension

ext = Extension(
    "reports-ext",
    display_name="Reports",
    description="Example showing skeleton TTL choice for slow-changing data.",
    actions_explicit=True,
)


# ttl=300: schema rarely changes. A 5-minute window is acceptable.
# Keeping TTL high avoids frequent expensive schema introspection.
@ext.skeleton(
    "db_schema",
    alert=True,
    ttl=300,
    description="Active database schema — tables and columns.",
)
async def skeleton_refresh_db_schema(ctx) -> dict:
    try:
        resp = await ctx.http.post("/v1/schema", json={"user": ctx.user.imperal_id})
        tables = resp.json().get("tables", []) if resp.ok else []
        return {"response": {"table_count": len(tables), "tables": tables}}
    except Exception:
        return {"response": {"table_count": 0, "tables": []}}

Chat functions — user-blocking, must be fast

@chat.function handlers execute while the user waits for a chat response. After the intent classifier finishes (itself 400–1 500 ms), the handler runs synchronously in the activity. The user cannot do anything else during that time.

Target: < 500 ms p95 for a single step. For write operations that trigger confirmation flows, this is less visible — the confirmation card appears first, and the actual execution happens after acceptance. For read operations, the user is waiting for the answer.

In a chain, latency compounds. If your handler takes 400 ms and it is step 2 of 3, the chain takes at least 1 200 ms just for handler execution, before classification and delivery overhead.

The refresh model is broader than you think

ActionResult.refresh_panels controls which panels re-fetch after a successful handler. The semantics differ by delivery path:

Path	Trigger	`refresh_panels` behavior
Path A — HTTP direct call	`ui.Call(...)` from panel	Targeted: only the named panels re-fetch
Path B — SSE / chat	Chat function in a message	Ignored: all discovered panels re-fetch

On Path B (the chat path), setting refresh_panels=["sidebar"] does not limit the refresh to just the sidebar. The SSE publisher refreshes every panel the frontend has open. This is by design — the web-kernel cannot know which panels may have been affected by an action that arrived via chat.

Implication for panel design: panels must be cheap even on a no-op refresh. If your center panel takes 500 ms to render its initial state, every write operation in any extension will trigger a 500 ms panel reload. Design panels to be fast regardless of what prompted the refresh.

panels_viewer.py

from imperal_sdk import ui, Extension

ext = Extension(
    "notes-ext",
    display_name="Notes",
    description="Example showing a panel that is cheap even on no-op refresh.",
    actions_explicit=True,
)


@ext.panel("viewer", slot="center", center_overlay=True, title="Note")
async def notes_viewer(ctx, note_id: str = "") -> object:
    # Guard: if no note is selected, return immediately with no I/O.
    # This keeps the refresh cost near-zero when the user has no active note.
    if not note_id:
        return ui.Empty("Select a note to view it")

    # Only perform I/O when there is a selected item to load.
    resp = await ctx.http.get(f"/v1/notes/{note_id}", headers={"X-User": ctx.user.imperal_id})
    if not resp.ok:
        return ui.Error("Could not load note")

    note = resp.json()
    return ui.Stack([
        ui.Header(note.get("title", "Untitled")),
        ui.Markdown(note.get("content", "")),
    ])

What's free

These operations carry no meaningful performance cost. You can use them freely without profiling.

`ctx.user` and `ctx.tenant` access

Both are frozen Pydantic models injected at context construction time. Reading ctx.user.imperal_id, ctx.user.role, ctx.tenant.tenant_id, etc., is a plain attribute access with no I/O.

Reading skeleton output from cache (web-kernel side)

When the classifier reads skeleton data to build context for the LLM, it reads from Redis — a web-kernel-side operation you do not control and do not pay for in your handler.

Returning `ui.Empty(...)`

An empty state return from a panel handler serializes to a small JSON dict. It is the cheapest thing a panel can return and is the correct no-op pattern when there is nothing to show.

panels_empty.py

from imperal_sdk import ui, Extension

ext = Extension(
    "reports-ext",
    display_name="Reports",
    description="Example showing cheap empty-state panel return.",
    actions_explicit=True,
)


@ext.panel("detail", slot="center", center_overlay=True, title="Report Detail")
async def report_detail(ctx, report_id: str = "") -> object:
    if not report_id:
        # No I/O, no cost — returns immediately.
        return ui.Empty("Select a report", icon="BarChart2")

    # ... fetch and render the report
    return ui.Stack([ui.Text("Report data here")])

Building UINode trees

Constructing ui.Stack(...), ui.List(...), ui.ListItem(...), etc., is pure Python — no I/O, no serialization until the return value reaches the web-kernel. Build as many nodes as you need; the cost is CPU-proportional to tree size and negligible for typical panel output.

What's expensive

These operations involve I/O and should be minimized, parallelized, or cached.

`ctx.http.*` — network round trip

External HTTP calls are the most common source of panel latency. Each call adds a network round trip to an external service — typically 50–300 ms depending on the service and your infrastructure topology. A panel that makes three sequential HTTP calls adds 150–900 ms of unavoidable wait time.

Mitigations: parallelize with asyncio.gather, cache results with ctx.cache, or combine into a single batched call if the upstream API supports it.

`ctx.ai.*` — LLM inference

Calling ctx.ai.complete(...) from inside a handler makes a synchronous LLM inference call, typically 500–3 000 ms. For most @chat.function handlers this is the single most expensive operation available. The Pydantic feedback loop (SDK v4.1.0+) can trigger up to two additional inference calls on validation failure — up to 6 000 ms in the worst case.

Recommendation: avoid ctx.ai in panel handlers entirely. In @chat.function handlers, use it only where the LLM's reasoning is genuinely irreplaceable. For skeleton handlers, ctx.ai is more acceptable because the skeleton runs in the background.

`ctx.db` — raw database query

Raw database access via ctx.db is faster than HTTP but still involves a network trip to the database host plus query execution time. Simple indexed lookups are typically 5–20 ms; full-table scans or complex joins can be 100–500 ms or more.

Mitigations: ensure your queries use indexed fields, apply limit bounds, and cache results for stable data.

`ctx.store.query` — document store query

ctx.store.query(collection, where=..., limit=...) translates to a backend database query. Performance depends on the collection size and whether the where dict fields have backing indexes. Without selective where clauses, the backend scans all documents in the collection.

Always set a limit. The default limit is 100 documents, but large collections can make even a 100-row scan slow if there is no index.

Large UINode trees

The web-kernel serializes the UINode tree returned by your handler and sends it to the frontend. A panel returning thousands of list items produces a large JSON payload that is slow to serialize, slow to transmit, and slow to render. Pagination is the correct solution for large lists.

Caching strategies

`ctx.cache.get_or_fetch` — the primary pattern

get_or_fetch is the canonical caching pattern for panel handlers: check the cache, call the fetcher on miss, write the result back, return it. It handles the check-then-fetch atomicity correctly and is the pattern used in production across the notes, tasks, and mail extensions.

panels_cached.py

from __future__ import annotations
from pydantic import BaseModel
from imperal_sdk import Extension, ui

ext = Extension(
    "reports-ext",
    display_name="Reports",
    description="Example panel handler using get_or_fetch for stable metadata.",
    actions_explicit=True,
)


@ext.cache_model("account_list")
class AccountList(BaseModel):
    accounts: list[dict] = []


@ext.panel("sidebar", slot="left", title="Accounts")
async def accounts_sidebar(ctx) -> object:
    uid = ctx.user.imperal_id

    # Stable data: account list changes rarely — cache for 120 s.
    async def _load_accounts() -> AccountList:
        resp = await ctx.http.get("/v1/accounts", headers={"X-User": uid})
        return AccountList(accounts=resp.json().get("accounts", []) if resp.ok else [])

    entry = await ctx.cache.get_or_fetch(
        f"accounts:{uid}", AccountList, ttl_seconds=120, fetcher=_load_accounts,
    )

    items = [
        ui.ListItem(id=a["id"], title=a.get("name", "Unknown"))
        for a in entry.accounts
    ]
    return ui.List(items=items) if items else ui.Empty("No accounts connected")

Cache constraints to remember:

TTL must be 5–300 seconds (SDK enforces this — CACHE-TTL-1 AST rule)
Key must be alphanumeric + _-:, max 128 characters
Value must be a Pydantic BaseModel subclass
Value size is capped at 64 KB per entry
The model class must be registered via @ext.cache_model before first use

`@ext.cache_model` — register before use

Every model passed to ctx.cache.get/set/get_or_fetch must be registered. The registration must happen at import time (module scope), before any handler code runs. In multi-file extensions, the registration conventionally lives in a dedicated cache_models.py that is imported before handler modules.

app.py

from pydantic import BaseModel
from imperal_sdk import Extension

ext = Extension(
    "mail-ext",
    display_name="Mail",
    description="Example showing @ext.cache_model registration at module scope.",
    actions_explicit=True,
)


# Registered at import time — before any handler imports this module.
@ext.cache_model("inbox_page")
class InboxPage(BaseModel):
    messages: list[dict] = []
    total: int = 0
    next_cursor: str = ""


@ext.cache_model("unread_summary")
class UnreadSummary(BaseModel):
    unread_count: int = 0
    last_checked: str = ""

When to cache vs always-fetch

Data type	Cache?	TTL guidance
Account / connection list	Yes	60–120 s
Folder / label list	Yes	60 s
Folder stats / counts	Yes	30–60 s
Schema / metadata	Yes	120–300 s
Primary content list (inbox, note list)	Usually no	Freshness required
User-specific settings	Yes	60–120 s
Results of expensive aggregation	Yes	As long as acceptable staleness allows

Do not cache the primary content users are looking at directly. If a user adds a note and the sidebar still shows the old list 60 seconds later, that is a bug in UX terms even if it is technically correct. Cache metadata that structures the content, not the content itself.

Skeleton caching — pre-warming the cache from the background

The sql-db extension uses its skeleton handler to mirror data into the application cache. The skeleton runs in the background on a 300 s tick; any panel or chat function that needs the same data reads from the cache rather than making a live call. This pattern decouples panel latency from the upstream service response time.

skeleton_with_cache_mirror.py

from __future__ import annotations
from pydantic import BaseModel
from imperal_sdk import Extension

ext = Extension(
    "db-ext",
    display_name="Database",
    description="Example skeleton that pre-warms the app cache for panels.",
    actions_explicit=True,
)

SCHEMA_CACHE_KEY = "db_schema_snap"
SCHEMA_CACHE_TTL = 270  # slightly under skeleton ttl=300 so cache is always warm


@ext.cache_model("db_schema_snap")
class DbSchemaSnapshot(BaseModel):
    tables: list[dict] = []
    table_count: int = 0


@ext.skeleton("db_schema", alert=True, ttl=300,
              description="Active database schema — tables and columns.")
async def skeleton_refresh_db_schema(ctx) -> dict:
    try:
        resp = await ctx.http.post("/v1/schema", json={"user": ctx.user.imperal_id})
        tables = resp.json().get("tables", []) if resp.ok else []
        compact = [{"name": t["name"], "columns": t.get("columns", [])} for t in tables]

        # Pre-warm the cache so panel handlers avoid the live call.
        snap = DbSchemaSnapshot(tables=compact, table_count=len(compact))
        await ctx.cache.set(SCHEMA_CACHE_KEY, snap, ttl_seconds=SCHEMA_CACHE_TTL)

        return {"response": {"table_count": len(compact), "tables": compact}}
    except Exception:
        return {"response": {"table_count": 0, "tables": []}}

Skeleton patterns

TTL choice

The TTL passed to @ext.skeleton is a hint to the web-kernel about how frequently to run the background refresh tick. Choose it based on how quickly users need to see accurate data in the classifier, not on how expensive the refresh is:

Use case	TTL guidance	Rationale
Task counters, unread counts	30–60 s	LLM should see accurate counts after writes
Folder / project structure	120–300 s	Changes less frequently
Database schema	300 s	Schema changes are rare and deliberate
Email inbox summary	60–120 s	New mail arrives continuously

A short TTL makes the skeleton data fresher but increases the background I/O load across your user base. The tasks extension uses ttl=30 because task counts change frequently and the classifier must route accurately after each write. The sql-db extension uses ttl=300 because schema introspection is expensive and schema changes are infrequent.

Alert mode (`alert=True`)

When alert=True, the web-kernel compares the new skeleton output to the previous snapshot and, if they differ, emits a change notification. This is event-driven freshness: instead of polling on a fixed TTL, the system reacts to actual changes.

Use alert=True when:

Your skeleton surfaces counts or status fields that change meaningfully
The change has UX significance (new unread mail, new overdue tasks)

Pair alert=True with a companion tool named skeleton_alert_{section_name} that compares old and new snapshots and returns a human-readable alert string.

Auto-rotate at 500 iterations

The web-kernel automatically rotates the skeleton refresh worker at 500 iterations. This is a platform-level resource management feature — you do not need to implement it or handle it in your skeleton handler. Write skeleton handlers as idempotent, stateless functions.

Panel patterns

Pagination for large lists

Never return thousands of items in a single panel response. Use ui.List(page_size=N, on_end_reached=ui.Call(...)) to implement infinite scroll. Load the first page eagerly; load subsequent pages only when the user scrolls to the bottom.

panels_paginated.py

from __future__ import annotations
from imperal_sdk import Extension, ui

ext = Extension(
    "contacts-ext",
    display_name="Contacts",
    description="Example panel with infinite-scroll pagination for a large list.",
    actions_explicit=True,
)


@ext.panel("contacts", slot="left", title="Contacts")
async def contacts_sidebar(ctx, cursor: str = "") -> object:
    uid = ctx.user.imperal_id
    params: dict[str, object] = {"user_id": uid, "limit": 50}
    if cursor:
        params["cursor"] = cursor

    resp = await ctx.http.get("/v1/contacts", params=params)
    if not resp.ok:
        return ui.Error("Could not load contacts")

    data = resp.json()
    contacts = data.get("contacts", [])
    next_cursor = data.get("next_cursor", "")
    total = data.get("total", len(contacts))

    items = [
        ui.ListItem(id=c["id"], title=c.get("name", "Unknown"))
        for c in contacts
    ]

    return ui.List(
        items=items,
        total_items=total,
        page_size=50,
        on_end_reached=ui.Call("__panel__contacts", cursor=next_cursor) if next_cursor else None,
    )

Lazy-render with `ui.Loading`

For panels with expensive initial loads, return ui.Loading(...) immediately while the data fetches in the background. This makes the panel appear responsive even when the backend is slow.

In practice, because panel handlers are async, the web-kernel awaits the result before sending to the frontend. ui.Loading is most useful as a placeholder inside a ui.Stack for a section that loads independently via auto_action or a ui.Call.

Batched fetches — never serial HTTP calls

If your panel needs data from multiple endpoints, fetch them in parallel:

panels_dashboard.py

from __future__ import annotations
import asyncio
from imperal_sdk import Extension, ui

ext = Extension(
    "analytics-ext",
    display_name="Analytics",
    description="Example panel batching multiple HTTP calls in parallel.",
    actions_explicit=True,
)


@ext.panel("overview", slot="right", title="Overview")
async def analytics_overview(ctx) -> object:
    uid = ctx.user.imperal_id
    headers = {"X-User": uid}

    # Parallel fetch — all three execute concurrently.
    visits_resp, revenue_resp, users_resp = await asyncio.gather(
        ctx.http.get("/v1/stats/visits", headers=headers),
        ctx.http.get("/v1/stats/revenue", headers=headers),
        ctx.http.get("/v1/stats/users", headers=headers),
    )

    visits = visits_resp.json().get("total", 0) if visits_resp.ok else 0
    revenue = revenue_resp.json().get("amount", 0.0) if revenue_resp.ok else 0.0
    active_users = users_resp.json().get("active", 0) if users_resp.ok else 0

    return ui.Stack([
        ui.Stats(children=[
            ui.Stat(label="Visits", value=visits, icon="👁️"),
            ui.Stat(label="Revenue", value=f"${revenue:,.2f}", icon="💵"),
            ui.Stat(label="Active users", value=active_users, icon="👥"),
        ])
    ])

Serial I/O is the most common performance problem in panel handlers. Three sequential 100 ms calls become 300 ms; three parallel calls become 100 ms.

Chain patterns

`depends_on` for parallel-safe steps

When a chain step does not depend on the output of a previous step, declaring depends_on=[] (or an explicit subset) lets the web-kernel's topological sorter run independent steps in parallel. Steps that are declared as read operations and have no dependency on prior write results can be scheduled concurrently.

The depends_on field belongs in the classifier's action_plans schema — it is not a decorator kwarg. The web-kernel applies Kahn's topological sort to the declared plan before dispatching steps.

Ordering guarantee: read steps are always scheduled before dependent write steps. If your classifier emits depends_on correctly, a chain like [mail.list_unread, notes.create_note(depends_on=[mail.list_unread])] will always run the read before the write, regardless of list order.

`id_projection` to avoid LLM re-resolution

When a chain step receives an entity ID from a prior step, id_projection tells the web-kernel which parameter field carries the target ID. The web-kernel injects the ID directly — the LLM does not need to re-resolve it. This eliminates one round of LLM inference per step that requires ID threading.

handlers_chain.py

from pydantic import BaseModel
from imperal_sdk import Extension, ChatExtension, ActionResult

ext = Extension(
    "folders-ext",
    display_name="Folders",
    description="Example showing id_projection for chain ID threading.",
    actions_explicit=True,
)
chat = ChatExtension(
    ext,
    tool_name="tool_folders_chat",
    description="AI chat interface for folder management.",
)


class DeleteFolderContentsParams(BaseModel):
    folder_id: str


# id_projection="folder_id" tells the web-kernel that the "folder_id" field
# carries the entity ID for this step. In a chain, the web-kernel injects the
# resolved folder_id from the prior step without asking the LLM to re-state it.
@chat.function(
    "delete_notes_from_folder",
    description="Delete all notes from a specified folder by folder ID.",
    action_type="destructive",
    chain_callable=True,
    effects=["delete:note"],
    id_projection="folder_id",
)
async def fn_delete_notes_from_folder(ctx, params: DeleteFolderContentsParams) -> ActionResult:
    if not params.folder_id:
        return ActionResult.error("folder_id is required")
    resp = await ctx.http.delete(
        f"/v1/folders/{params.folder_id}/contents",
        headers={"X-User": ctx.user.imperal_id},
    )
    if not resp.ok:
        return ActionResult.error("Could not delete folder contents")
    count = resp.json().get("deleted", 0)
    return ActionResult.success(
        data={"folder_id": params.folder_id, "deleted_count": count},
        summary=f"Deleted {count} notes from folder",
        refresh_panels=["sidebar"],
    )

Use id_projection for compound function names where the verb-prefix heuristic cannot derive the correct field name. For simple names like delete_note, the heuristic finds note_id automatically. For names like delete_notes_from_folder, you need id_projection="folder_id".

Storage tier choice

Three storage tiers are available; choosing the right one for each data type is a performance decision:

Tier	API	Latency	TTL	Use for
`ctx.cache`	`get/set/get_or_fetch`	< 5 ms (Redis)	5–300 s	Short-lived, derived, computed data
`ctx.store`	`create/get/query/update/delete`	10–50 ms (DB)	Permanent	User-owned, persistent documents
`ctx.db`	`acquire/session`	10–100 ms (DB)	Permanent	Complex SQL queries, joins, migrations

For panel handlers, ctx.cache is the correct tier for pre-warmed or aggregated data. ctx.store is the correct tier for user-owned content (notes, tasks, contacts). ctx.db is for extensions that own raw SQL schemas.

Do not use ctx.store as a cache. Store documents persist indefinitely and are not evicted by TTL. If you are storing intermediate or derived data that should expire, use ctx.cache.

For a deeper discussion of when to use each tier, see Cache vs store.

Observability

Structured logging

Use logging.getLogger(__name__) (synchronous) or await ctx.log(...) (async) for structured event logs. Both routes are visible in the extension dashboard and in the platform's log aggregator.

Tag your log lines with user-relevant context so you can filter by user or operation:

handlers_logging.py

from pydantic import BaseModel
from imperal_sdk import Extension, ChatExtension, ActionResult
import logging

ext = Extension(
    "reports-ext",
    display_name="Reports",
    description="Example showing structured logging with context tags.",
    actions_explicit=True,
)
chat = ChatExtension(
    ext,
    tool_name="tool_reports_chat",
    description="AI chat interface for reports.",
)

log = logging.getLogger(__name__)


class GenerateReportParams(BaseModel):
    report_type: str


@chat.function(
    "generate_report",
    description="Generate a report of the specified type for the current user.",
    action_type="read",
)
async def fn_generate_report(ctx, params: GenerateReportParams) -> ActionResult:
    uid = ctx.user.imperal_id
    log.info(
        "generate_report start user=%s type=%s",
        uid,
        params.report_type,
    )

    resp = await ctx.http.post(
        "/v1/reports/generate",
        json={"user_id": uid, "type": params.report_type},
    )

    if not resp.ok:
        log.warning(
            "generate_report backend error user=%s status=%d",
            uid,
            resp.status_code,
        )
        return ActionResult.error("Report generation failed. Please try again.", retryable=True)

    log.info("generate_report success user=%s", uid)
    return ActionResult.success(
        data=resp.json(),
        summary=f"Report generated: {params.report_type}",
    )

ctx.log is async def and must be awaited. Standard logging calls are synchronous. Both are acceptable; standard logging is more common in production extensions.

Audit ledger

Every @chat.function invocation is recorded in the platform audit ledger with its action type, status, and timing. Latency hot-spots appear in the audit ledger automatically. You do not need to emit latency metrics manually.

Platform dashboards

High-level latency distributions and error rates are visible in the platform monitoring dashboard. Use these as the first signal that a handler has a latency regression. Drill into structured logs for per-invocation detail.

Common pitfalls

Pitfall 1: N+1 queries in a panel handler

The most common panel performance problem: rendering a list of items where each item requires its own HTTP call.

panels_n_plus_1.py

from imperal_sdk import Extension, ui

ext = Extension(
    "tasks-ext",
    display_name="Tasks",
    description="Example showing N+1 query anti-pattern to avoid.",
    actions_explicit=True,
)


@ext.panel("tasks", slot="left", title="Tasks")
async def tasks_sidebar_bad(ctx) -> object:
    uid = ctx.user.imperal_id
    tasks_resp = await ctx.http.get("/v1/tasks", params={"user_id": uid})
    tasks = tasks_resp.json().get("tasks", []) if tasks_resp.ok else []

    items = []
    for task in tasks:
        # ❌ One HTTP call per task — 50 tasks = 50 HTTP calls
        detail_resp = await ctx.http.get(f"/v1/tasks/{task['id']}/detail")
        detail = detail_resp.json() if detail_resp.ok else {}
        items.append(ui.ListItem(
            id=task["id"],
            title=detail.get("title", task.get("title", "")),
        ))

    return ui.List(items=items)

Fix: fetch all detail in a single batch call, or include detail in the list endpoint response.

Pitfall 2: short-TTL skeleton with expensive aggregation

skeleton_expensive.py

from imperal_sdk import Extension

ext = Extension(
    "crm-ext",
    display_name="CRM",
    description="Example showing a skeleton TTL mismatch to avoid.",
    actions_explicit=True,
)


# ❌ ttl=10 with a heavy aggregation query runs 6 times/minute per user.
@ext.skeleton("crm_summary", ttl=10,
              description="CRM summary with full contact aggregation.")
async def skeleton_refresh_crm_bad(ctx) -> dict:
    # This call takes 800 ms — running it every 10 s is 6 calls/min per user.
    resp = await ctx.http.post("/v1/crm/aggregate-all", json={"user": ctx.user.imperal_id})
    return {"response": resp.json() if resp.ok else {}}

Fix: either make the aggregation cheaper (return counts, not full records), or use a longer TTL. If freshness is critical, use alert=True with a change-detection companion rather than a short TTL.

Pitfall 3: chain step that blocks on another extension synchronously

If your chain step calls ctx.extensions.call(app_id, ...) synchronously inside a handler, that call blocks the current activity. Long-running inter-extension calls in a chain context multiply: a 500 ms IPC call in step 2 of a 3-step chain adds 500 ms to the total chain latency.

Use ctx.extensions.call for data lookups, not for triggering side effects in other extensions. Side effects should be modeled as chain steps with their own @chat.function declarations, not hidden IPC calls.

Pitfall 4: ignoring cache hit rate

Writing a cache entry and never checking whether it is actually being hit is a common trap. If the cache key changes on every request (for example, including a timestamp or a nonce), the cache always misses and you are spending I/O on writes with no benefit.

Verify your cache keys are stable across requests for the same logical entity. A key like f"folders:{uid}" is stable — same user always hits the same key. A key like f"folders:{uid}:{time.time()}" is never stable.

Pitfall 5: `ctx.ai` in a panel handler

Calling ctx.ai.complete(...) from inside a panel handler adds 500–3 000 ms of LLM inference to what should be a fast UI render. Panel handlers should fetch data and build UI nodes — they should not call LLMs.

If you need LLM-generated content in a panel, pre-generate it in a @ext.schedule or @ext.skeleton handler and cache the result. The panel reads from cache.

Pitfall 6: large UINode trees without pagination

Returning a ui.List with 500 ui.ListItem nodes is slow to serialize, slow to transfer, and slow to render in the browser. The platform does not enforce a node count limit — it is your responsibility to paginate.

As a rule of thumb: keep panel responses under 100 items. Use on_end_reached for infinite scroll or page_size for explicit pagination.

Performance

Cache vs store

cache_model reference

Skeleton reference

Chains guide

On this page