Imperal Docs
Guides

Performance

Performance for Webbee extensions: latency budgets, caching, batching, skeleton and panel patterns, and avoiding slow blocking calls so handlers feel fast.

Extension handlers do not run in isolation. Each invocation is one part of a larger request the platform processes on the user's behalf, and your handler runs alongside platform-managed work you do not control. Understanding where your code sits in that flow — and what costs what — is the foundation for writing extensions that feel fast.

TopicSection
Latency budgetBudget
Hot pathsHot paths
Refresh modelRefresh model
Free operationsFree
Expensive operationsExpensive
Caching strategiesCaching
Skeleton patternsSkeletons
Panel patternsPanels
Chain patternsChains
Storage tier choiceStorage
ObservabilityObservability
Common pitfallsPitfalls
Cross-referencesSee also

Latency budget

A typical chat-turn has a target range of 80–300 ms for the handler portion. The end-to-end latency the user perceives is higher — the platform must understand the request and run LLM inference before your handler executes — but those costs are outside your control. Your extension handler is a slice of a larger budget.

Where time goes in a complete turn:

StageTypical costOwner
Understanding the request (LLM)400–1 500 msPlatform
Routing and scheduling10–30 msPlatform
Handing off to your extension5–20 msPlatform
Your handler execution20–300 msYou
Delivering the response5–20 msPlatform

The handler portion is where your choices have the most impact. A handler that makes five sequential HTTP calls easily turns a 200 ms budget into a 1 500 ms experience.

For chain turns — where two or more tools execute in sequence — each step's handler latency multiplies. A 300 ms handler becomes 900 ms across three steps.


Hot paths vs cold paths

Not all handlers execute with the same frequency or latency expectations. Categorize yours before optimizing.

Panels — hot, frequent, should be cheap

Panel handlers execute whenever the frontend re-fetches panel content. That happens:

  • On auto_action load (user opens the panel tab)
  • When ActionResult.refresh_panels names the panel
  • On on_event: SSE events matching the panel's refresh declaration

Target: < 100 ms p50. The user is sitting in front of the UI, waiting. Every panel re-render that takes 500 ms is perceptible friction.

A sidebar panel that renders folders, stats, and a note list — like the notes extension — separates its data sources:

  • Folders: cached 60 s — ctx.cache.get_or_fetch
  • Folder stats: cached 30 s — ctx.cache.get_or_fetch
  • Notes list: not cached — primary content, always fresh

The folders and stats are structurally stable data; caching them is correct. The notes list must be fresh; that single live HTTP call is the dominant cost in the panel handler. Two cached calls + one live call is substantially cheaper than three live calls.

Skeletons — background, periodic, should be thorough

Skeleton handlers run in the background on their configured TTL tick, not during a user-visible request. They are not on the hot path for user-perceived latency. You can afford more work — multiple HTTP calls, aggregation — because the result is pre-computed and served from the platform's fast in-memory tier when the agent needs it.

But: skeleton handlers that perform expensive work on a short TTL amplify that cost across the user population. A skeleton with ttl=10 doing a 500 ms aggregation runs six times per minute per active user.

skeleton.py
from imperal_sdk import Extension

ext = Extension(
    "tasks-ext",
    display_name="Tasks",
    description="Example showing skeleton TTL choice for frequently-updated data.",
    actions_explicit=True,
)


# ttl=30: short because the LLM must see fresh task counts soon after writes.
# This works because the skeleton only surfaces counters + recent IDs — no
# expensive joins or aggregation.
@ext.skeleton(
    "tasks",
    alert=True,
    ttl=30,
    description="Today/overdue/upcoming counts and recent task IDs.",
)
async def skeleton_refresh_tasks(ctx) -> dict:
    # Fan-out with asyncio.gather to parallelize the HTTP calls.
    # All five calls execute concurrently — total wall time ≈ slowest call.
    import asyncio
    try:
        today_raw, overdue_raw = await asyncio.gather(
            ctx.http.get("/v1/tasks", params={"filter": "due_today"}),
            ctx.http.get("/v1/tasks", params={"filter": "overdue"}),
        )
        return {
            "response": {
                "today_count": today_raw.json().get("total", 0),
                "overdue_count": overdue_raw.json().get("total", 0),
            }
        }
    except Exception:
        return {"response": {"today_count": 0, "overdue_count": 0}}
skeleton_slow.py
from imperal_sdk import Extension

ext = Extension(
    "reports-ext",
    display_name="Reports",
    description="Example showing skeleton TTL choice for slow-changing data.",
    actions_explicit=True,
)


# ttl=300: schema rarely changes. A 5-minute window is acceptable.
# Keeping TTL high avoids frequent expensive schema introspection.
@ext.skeleton(
    "db_schema",
    alert=True,
    ttl=300,
    description="Active database schema — tables and columns.",
)
async def skeleton_refresh_db_schema(ctx) -> dict:
    try:
        resp = await ctx.http.post("/v1/schema", json={"user": ctx.user.imperal_id})
        tables = resp.json().get("tables", []) if resp.ok else []
        return {"response": {"table_count": len(tables), "tables": tables}}
    except Exception:
        return {"response": {"table_count": 0, "tables": []}}

Chat functions — user-blocking, must be fast

@chat.function handlers execute while the user waits for a chat response. After the platform finishes understanding the request (itself 400–1 500 ms), your handler runs synchronously. The user cannot do anything else during that time.

Target: < 500 ms p95 for a single step. For write operations that trigger confirmation flows, this is less visible — the confirmation card appears first, and the actual execution happens after acceptance. For read operations, the user is waiting for the answer.

In a chain, latency compounds. If your handler takes 400 ms and it is step 2 of 3, the chain takes at least 1 200 ms just for handler execution, before classification and delivery overhead.


The refresh model is broader than you think

ActionResult.refresh_panels controls which panels re-fetch after a successful handler. The semantics differ by delivery path:

PathTriggerrefresh_panels behavior
Path A — HTTP direct callui.Call(...) from panelTargeted: only the named panels re-fetch
Path B — SSE / chatChat function in a messageIgnored: all discovered panels re-fetch

On Path B (the chat path), setting refresh_panels=["sidebar"] does not limit the refresh to just the sidebar. The SSE publisher refreshes every panel the frontend has open. This is by design — the web-kernel cannot know which panels may have been affected by an action that arrived via chat.

Implication for panel design: panels must be cheap even on a no-op refresh. If your center panel takes 500 ms to render its initial state, every write operation in any extension will trigger a 500 ms panel reload. Design panels to be fast regardless of what prompted the refresh.

panels_viewer.py
from imperal_sdk import ui, Extension

ext = Extension(
    "notes-ext",
    display_name="Notes",
    description="Example showing a panel that is cheap even on no-op refresh.",
    actions_explicit=True,
)


@ext.panel("viewer", slot="center", center_overlay=True, title="Note")
async def notes_viewer(ctx, note_id: str = "") -> object:
    # Guard: if no note is selected, return immediately with no I/O.
    # This keeps the refresh cost near-zero when the user has no active note.
    if not note_id:
        return ui.Empty("Select a note to view it")

    # Only perform I/O when there is a selected item to load.
    resp = await ctx.http.get(f"/v1/notes/{note_id}", headers={"X-User": ctx.user.imperal_id})
    if not resp.ok:
        return ui.Error("Could not load note")

    note = resp.json()
    return ui.Stack([
        ui.Header(note.get("title", "Untitled")),
        ui.Markdown(note.get("content", "")),
    ])

What's free

These operations carry no meaningful performance cost. You can use them freely without profiling.

ctx.user and ctx.tenant access

Both are frozen Pydantic models injected at context construction time. Reading ctx.user.imperal_id, ctx.user.role, ctx.tenant.tenant_id, etc., is a plain attribute access with no I/O.

The platform reading your skeleton output

When the platform reads your pre-computed skeleton data to build context for the agent, that happens platform-side — an operation you do not control and do not pay for in your handler.

Returning ui.Empty(...)

An empty state return from a panel handler serializes to a small JSON dict. It is the cheapest thing a panel can return and is the correct no-op pattern when there is nothing to show.

panels_empty.py
from imperal_sdk import ui, Extension

ext = Extension(
    "reports-ext",
    display_name="Reports",
    description="Example showing cheap empty-state panel return.",
    actions_explicit=True,
)


@ext.panel("detail", slot="center", center_overlay=True, title="Report Detail")
async def report_detail(ctx, report_id: str = "") -> object:
    if not report_id:
        # No I/O, no cost — returns immediately.
        return ui.Empty("Select a report", icon="BarChart2")

    # ... fetch and render the report
    return ui.Stack([ui.Text("Report data here")])

Building UINode trees

Constructing ui.Stack(...), ui.List(...), ui.ListItem(...), etc., is pure Python — no I/O, no serialization until the return value reaches the web-kernel. Build as many nodes as you need; the cost is CPU-proportional to tree size and negligible for typical panel output.


What's expensive

These operations involve I/O and should be minimized, parallelized, or cached.

ctx.http.* — network round trip

External HTTP calls are the most common source of panel latency. Each call adds a network round trip to an external service — typically 50–300 ms depending on the service and your infrastructure topology. A panel that makes three sequential HTTP calls adds 150–900 ms of unavoidable wait time.

Mitigations: parallelize with asyncio.gather, cache results with ctx.cache, or combine into a single batched call if the upstream API supports it.

ctx.ai.* — LLM inference

Calling ctx.ai.complete(...) from inside a handler makes a synchronous LLM inference call, typically 500–3 000 ms. For most @chat.function handlers this is the single most expensive operation available. Separately, if your typed arguments fail validation, the platform may re-prompt the model up to twice to correct them — additional inference latency, up to roughly 6 000 ms in the worst case.

Recommendation: avoid ctx.ai in panel handlers entirely. In @chat.function handlers, use it only where the LLM's reasoning is genuinely irreplaceable. For skeleton handlers, ctx.ai is more acceptable because the skeleton runs in the background.

ctx.store.query — document store query

ctx.store.query(collection, where=..., limit=...) translates to a backend database query. Performance depends on the collection size and whether the where dict fields have backing indexes. Without selective where clauses, the backend scans all documents in the collection.

Always set a limit. The default limit is 100 documents, but large collections can make even a 100-row scan slow if there is no index.

Large UINode trees

The web-kernel serializes the UINode tree returned by your handler and sends it to the frontend. A panel returning thousands of list items produces a large JSON payload that is slow to serialize, slow to transmit, and slow to render. Pagination is the correct solution for large lists.


Caching strategies

ctx.cache.get_or_fetch — the primary pattern

get_or_fetch is the canonical caching pattern for panel handlers: check the cache, call the fetcher on miss, write the result back, return it. It handles the check-then-fetch atomicity correctly and is the pattern used in production across the notes, tasks, and mail extensions.

panels_cached.py
from __future__ import annotations
from pydantic import BaseModel
from imperal_sdk import Extension, ui

ext = Extension(
    "reports-ext",
    display_name="Reports",
    description="Example panel handler using get_or_fetch for stable metadata.",
    actions_explicit=True,
)


@ext.cache_model("account_list")
class AccountList(BaseModel):
    accounts: list[dict] = []


@ext.panel("sidebar", slot="left", title="Accounts")
async def accounts_sidebar(ctx) -> object:
    uid = ctx.user.imperal_id

    # Stable data: account list changes rarely — cache for 120 s.
    async def _load_accounts() -> AccountList:
        resp = await ctx.http.get("/v1/accounts", headers={"X-User": uid})
        return AccountList(accounts=resp.json().get("accounts", []) if resp.ok else [])

    entry = await ctx.cache.get_or_fetch(
        f"accounts:{uid}", AccountList, ttl_seconds=120, fetcher=_load_accounts,
    )

    items = [
        ui.ListItem(id=a["id"], title=a.get("name", "Unknown"))
        for a in entry.accounts
    ]
    return ui.List(items=items) if items else ui.Empty("No accounts connected")

Cache constraints to remember:

  • TTL must be 5–300 seconds (SDK enforces this — CACHE-TTL-1 AST rule)
  • Key must be alphanumeric + _-:, max 128 characters
  • Value must be a Pydantic BaseModel subclass
  • Value size is capped at 64 KB per entry
  • The model class must be registered via @ext.cache_model before first use

@ext.cache_model — register before use

Every model passed to ctx.cache.get/set/get_or_fetch must be registered. The registration must happen at import time (module scope), before any handler code runs. In multi-file extensions, the registration conventionally lives in a dedicated cache_models.py that is imported before handler modules.

app.py
from pydantic import BaseModel
from imperal_sdk import Extension

ext = Extension(
    "mail-ext",
    display_name="Mail",
    description="Example showing @ext.cache_model registration at module scope.",
    actions_explicit=True,
)


# Registered at import time — before any handler imports this module.
@ext.cache_model("inbox_page")
class InboxPage(BaseModel):
    messages: list[dict] = []
    total: int = 0
    next_cursor: str = ""


@ext.cache_model("unread_summary")
class UnreadSummary(BaseModel):
    unread_count: int = 0
    last_checked: str = ""

When to cache vs always-fetch

Data typeCache?TTL guidance
Account / connection listYes60–120 s
Folder / label listYes60 s
Folder stats / countsYes30–60 s
Schema / metadataYes120–300 s
Primary content list (inbox, note list)Usually noFreshness required
User-specific settingsYes60–120 s
Results of expensive aggregationYesAs long as acceptable staleness allows

Do not cache the primary content users are looking at directly. If a user adds a note and the sidebar still shows the old list 60 seconds later, that is a bug in UX terms even if it is technically correct. Cache metadata that structures the content, not the content itself.

Skeleton caching — pre-warming the cache from the background

The sql-db extension uses its skeleton handler to mirror data into the application cache. The skeleton runs in the background on a 300 s tick; any panel or chat function that needs the same data reads from the cache rather than making a live call. This pattern decouples panel latency from the upstream service response time.

skeleton_with_cache_mirror.py
from __future__ import annotations
from pydantic import BaseModel
from imperal_sdk import Extension

ext = Extension(
    "db-ext",
    display_name="Database",
    description="Example skeleton that pre-warms the app cache for panels.",
    actions_explicit=True,
)

SCHEMA_CACHE_KEY = "db_schema_snap"
SCHEMA_CACHE_TTL = 270  # slightly under skeleton ttl=300 so cache is always warm


@ext.cache_model("db_schema_snap")
class DbSchemaSnapshot(BaseModel):
    tables: list[dict] = []
    table_count: int = 0


@ext.skeleton("db_schema", alert=True, ttl=300,
              description="Active database schema — tables and columns.")
async def skeleton_refresh_db_schema(ctx) -> dict:
    try:
        resp = await ctx.http.post("/v1/schema", json={"user": ctx.user.imperal_id})
        tables = resp.json().get("tables", []) if resp.ok else []
        compact = [{"name": t["name"], "columns": t.get("columns", [])} for t in tables]

        # Pre-warm the cache so panel handlers avoid the live call.
        snap = DbSchemaSnapshot(tables=compact, table_count=len(compact))
        await ctx.cache.set(SCHEMA_CACHE_KEY, snap, ttl_seconds=SCHEMA_CACHE_TTL)

        return {"response": {"table_count": len(compact), "tables": compact}}
    except Exception:
        return {"response": {"table_count": 0, "tables": []}}

Skeleton patterns

TTL choice

The TTL passed to @ext.skeleton is a hint to the web-kernel about how frequently to run the background refresh tick. Choose it based on how quickly users need to see accurate data in the classifier, not on how expensive the refresh is:

Use caseTTL guidanceRationale
Task counters, unread counts30–60 sLLM should see accurate counts after writes
Folder / project structure120–300 sChanges less frequently
Database schema300 sSchema changes are rare and deliberate
Email inbox summary60–120 sNew mail arrives continuously

A short TTL makes the skeleton data fresher but increases the background I/O load across your user base. The tasks extension uses ttl=30 because task counts change frequently and the classifier must route accurately after each write. The sql-db extension uses ttl=300 because schema introspection is expensive and schema changes are infrequent.

Alert mode (alert=True)

When alert=True, the web-kernel compares the new skeleton output to the previous snapshot and, if they differ, emits a change notification. This is event-driven freshness: instead of polling on a fixed TTL, the system reacts to actual changes.

Use alert=True when:

  • Your skeleton surfaces counts or status fields that change meaningfully
  • The change has UX significance (new unread mail, new overdue tasks)

Pair alert=True with a companion tool named skeleton_alert_{section_name} that compares old and new snapshots and returns a human-readable alert string.

Background refresh is managed for you

The platform manages the lifecycle of the background skeleton-refresh process for you, including periodically recycling it for resource hygiene. You do not need to implement or handle any of this in your skeleton handler. Write skeleton handlers as idempotent, stateless functions.


Panel patterns

Pagination for large lists

Never return thousands of items in a single panel response. Use ui.List(page_size=N, on_end_reached=ui.Call(...)) to implement infinite scroll. Load the first page eagerly; load subsequent pages only when the user scrolls to the bottom.

panels_paginated.py
from __future__ import annotations
from imperal_sdk import Extension, ui

ext = Extension(
    "contacts-ext",
    display_name="Contacts",
    description="Example panel with infinite-scroll pagination for a large list.",
    actions_explicit=True,
)


@ext.panel("contacts", slot="left", title="Contacts")
async def contacts_sidebar(ctx, cursor: str = "") -> object:
    uid = ctx.user.imperal_id
    params: dict[str, object] = {"user_id": uid, "limit": 50}
    if cursor:
        params["cursor"] = cursor

    resp = await ctx.http.get("/v1/contacts", params=params)
    if not resp.ok:
        return ui.Error("Could not load contacts")

    data = resp.json()
    contacts = data.get("contacts", [])
    next_cursor = data.get("next_cursor", "")
    total = data.get("total", len(contacts))

    items = [
        ui.ListItem(id=c["id"], title=c.get("name", "Unknown"))
        for c in contacts
    ]

    return ui.List(
        items=items,
        total_items=total,
        page_size=50,
        on_end_reached=ui.Call("__panel__contacts", cursor=next_cursor) if next_cursor else None,
    )

Lazy-render with ui.Loading

For panels with expensive initial loads, return ui.Loading(...) immediately while the data fetches in the background. This makes the panel appear responsive even when the backend is slow.

In practice, because panel handlers are async, the web-kernel awaits the result before sending to the frontend. ui.Loading is most useful as a placeholder inside a ui.Stack for a section that loads independently via auto_action or a ui.Call.

Batched fetches — never serial HTTP calls

If your panel needs data from multiple endpoints, fetch them in parallel:

panels_dashboard.py
from __future__ import annotations
import asyncio
from imperal_sdk import Extension, ui

ext = Extension(
    "analytics-ext",
    display_name="Analytics",
    description="Example panel batching multiple HTTP calls in parallel.",
    actions_explicit=True,
)


@ext.panel("overview", slot="right", title="Overview")
async def analytics_overview(ctx) -> object:
    uid = ctx.user.imperal_id
    headers = {"X-User": uid}

    # Parallel fetch — all three execute concurrently.
    visits_resp, revenue_resp, users_resp = await asyncio.gather(
        ctx.http.get("/v1/stats/visits", headers=headers),
        ctx.http.get("/v1/stats/revenue", headers=headers),
        ctx.http.get("/v1/stats/users", headers=headers),
    )

    visits = visits_resp.json().get("total", 0) if visits_resp.ok else 0
    revenue = revenue_resp.json().get("amount", 0.0) if revenue_resp.ok else 0.0
    active_users = users_resp.json().get("active", 0) if users_resp.ok else 0

    return ui.Stack([
        ui.Stats(children=[
            ui.Stat(label="Visits", value=visits, icon="👁️"),
            ui.Stat(label="Revenue", value=f"${revenue:,.2f}", icon="💵"),
            ui.Stat(label="Active users", value=active_users, icon="👥"),
        ])
    ])

Serial I/O is the most common performance problem in panel handlers. Three sequential 100 ms calls become 300 ms; three parallel calls become 100 ms.


Chain patterns

depends_on for parallel-safe steps

When a chain step does not depend on the output of a previous step, the platform can run independent steps in parallel. Steps that are read operations with no dependency on prior write results may be scheduled concurrently, which removes their latency from the critical path.

Ordering guarantee: reads are always ordered before the dependent writes that consume their output. In a chain like "list unread mail, then create a note from it", the read always runs before the write that depends on it, regardless of the order they were requested in.

id_projection to avoid LLM re-resolution

When a chain step receives an entity ID from a prior step, id_projection names the parameter field that carries the target ID. The platform threads the resolved ID into that field for you, which keeps the step fast.

handlers_chain.py
from pydantic import BaseModel
from imperal_sdk import Extension, ChatExtension, ActionResult

ext = Extension(
    "folders-ext",
    display_name="Folders",
    description="Example showing id_projection for chain ID threading.",
    actions_explicit=True,
)
chat = ChatExtension(
    ext,
    tool_name="tool_folders_chat",
    description="AI chat interface for folder management.",
)


class DeleteFolderContentsParams(BaseModel):
    folder_id: str


# id_projection="folder_id" names the field that carries the entity ID for
# this step. In a chain, the platform threads the resolved folder_id from the
# prior step into this field for you.
@chat.function(
    "delete_notes_from_folder",
    description="Delete all notes from a specified folder by folder ID.",
    action_type="destructive",
    chain_callable=True,
    effects=["delete:note"],
    id_projection="folder_id",
)
async def fn_delete_notes_from_folder(ctx, params: DeleteFolderContentsParams) -> ActionResult:
    if not params.folder_id:
        return ActionResult.error("folder_id is required")
    resp = await ctx.http.delete(
        f"/v1/folders/{params.folder_id}/contents",
        headers={"X-User": ctx.user.imperal_id},
    )
    if not resp.ok:
        return ActionResult.error("Could not delete folder contents")
    count = resp.json().get("deleted", 0)
    return ActionResult.success(
        data={"folder_id": params.folder_id, "deleted_count": count},
        summary=f"Deleted {count} notes from folder",
        refresh_panels=["sidebar"],
    )

Use id_projection for compound function names where the target field name is not obvious from the function name. For a simple name like delete_note, the platform resolves note_id automatically. For a name like delete_notes_from_folder, set id_projection="folder_id" so the right field receives the ID.


Storage tier choice

Two storage tiers are available; choosing the right one for each data type is a performance decision:

TierAPILatencyTTLUse for
ctx.cacheget/set/get_or_fetch< 5 ms (fast in-memory tier)5–300 sShort-lived, derived, computed data
ctx.storecreate/get/query/update/delete10–50 ms (durable store)PermanentUser-owned, persistent documents

For panel handlers, ctx.cache is the correct tier for pre-warmed or aggregated data. ctx.store is the correct tier for user-owned content (notes, tasks, contacts) — and for any persistence an extension needs.

No raw-database surface

Use ctx.store for all persistence. There is no raw-SQL or direct-database surface exposed to extensions — all durable data goes through the document store API.

Do not use ctx.store as a cache. Store documents persist indefinitely and are not evicted by TTL. If you are storing intermediate or derived data that should expire, use ctx.cache.

For a deeper discussion of when to use each tier, see Cache vs store.


Observability

Structured logging

Use logging.getLogger(__name__) (synchronous) or await ctx.log(...) (async) for structured event logs. Both routes are visible in the extension dashboard.

Tag your log lines with user-relevant context so you can filter by user or operation:

handlers_logging.py
from pydantic import BaseModel
from imperal_sdk import Extension, ChatExtension, ActionResult
import logging

ext = Extension(
    "reports-ext",
    display_name="Reports",
    description="Example showing structured logging with context tags.",
    actions_explicit=True,
)
chat = ChatExtension(
    ext,
    tool_name="tool_reports_chat",
    description="AI chat interface for reports.",
)

log = logging.getLogger(__name__)


class GenerateReportParams(BaseModel):
    report_type: str


@chat.function(
    "generate_report",
    description="Generate a report of the specified type for the current user.",
    action_type="read",
)
async def fn_generate_report(ctx, params: GenerateReportParams) -> ActionResult:
    uid = ctx.user.imperal_id
    log.info(
        "generate_report start user=%s type=%s",
        uid,
        params.report_type,
    )

    resp = await ctx.http.post(
        "/v1/reports/generate",
        json={"user_id": uid, "type": params.report_type},
    )

    if not resp.ok:
        log.warning(
            "generate_report backend error user=%s status=%d",
            uid,
            resp.status_code,
        )
        return ActionResult.error("Report generation failed. Please try again.", retryable=True)

    log.info("generate_report success user=%s", uid)
    return ActionResult.success(
        data=resp.json(),
        summary=f"Report generated: {params.report_type}",
    )

ctx.log is async def and must be awaited. Standard logging calls are synchronous. Both are acceptable; standard logging is more common in production extensions.

Audit trail

Every @chat.function invocation is recorded in the platform audit trail with its action type, status, and timing. Latency hot-spots show up there automatically. You do not need to emit latency metrics manually.

Platform dashboards

High-level latency distributions and error rates are visible in the platform monitoring dashboard. Use these as the first signal that a handler has a latency regression. Drill into structured logs for per-invocation detail.


Common pitfalls

Pitfall 1: N+1 queries in a panel handler

The most common panel performance problem: rendering a list of items where each item requires its own HTTP call.

panels_n_plus_1.py
from imperal_sdk import Extension, ui

ext = Extension(
    "tasks-ext",
    display_name="Tasks",
    description="Example showing N+1 query anti-pattern to avoid.",
    actions_explicit=True,
)


@ext.panel("tasks", slot="left", title="Tasks")
async def tasks_sidebar_bad(ctx) -> object:
    uid = ctx.user.imperal_id
    tasks_resp = await ctx.http.get("/v1/tasks", params={"user_id": uid})
    tasks = tasks_resp.json().get("tasks", []) if tasks_resp.ok else []

    items = []
    for task in tasks:
        # ❌ One HTTP call per task — 50 tasks = 50 HTTP calls
        detail_resp = await ctx.http.get(f"/v1/tasks/{task['id']}/detail")
        detail = detail_resp.json() if detail_resp.ok else {}
        items.append(ui.ListItem(
            id=task["id"],
            title=detail.get("title", task.get("title", "")),
        ))

    return ui.List(items=items)

Fix: fetch all detail in a single batch call, or include detail in the list endpoint response.

Pitfall 2: short-TTL skeleton with expensive aggregation

skeleton_expensive.py
from imperal_sdk import Extension

ext = Extension(
    "crm-ext",
    display_name="CRM",
    description="Example showing a skeleton TTL mismatch to avoid.",
    actions_explicit=True,
)


# ❌ ttl=10 with a heavy aggregation query runs 6 times/minute per user.
@ext.skeleton("crm_summary", ttl=10,
              description="CRM summary with full contact aggregation.")
async def skeleton_refresh_crm_bad(ctx) -> dict:
    # This call takes 800 ms — running it every 10 s is 6 calls/min per user.
    resp = await ctx.http.post("/v1/crm/aggregate-all", json={"user": ctx.user.imperal_id})
    return {"response": resp.json() if resp.ok else {}}

Fix: either make the aggregation cheaper (return counts, not full records), or use a longer TTL. If freshness is critical, use alert=True with a change-detection companion rather than a short TTL.

Pitfall 3: chain step that blocks on another extension synchronously

If your chain step calls ctx.extensions.call(app_id, ...) synchronously inside a handler, that call blocks the current activity. Long-running inter-extension calls in a chain context multiply: a 500 ms IPC call in step 2 of a 3-step chain adds 500 ms to the total chain latency.

Use ctx.extensions.call for data lookups, not for triggering side effects in other extensions. Side effects should be modeled as chain steps with their own @chat.function declarations, not hidden IPC calls.

Pitfall 4: ignoring cache hit rate

Writing a cache entry and never checking whether it is actually being hit is a common trap. If the cache key changes on every request (for example, including a timestamp or a nonce), the cache always misses and you are spending I/O on writes with no benefit.

Verify your cache keys are stable across requests for the same logical entity. A key like f"folders:{uid}" is stable — same user always hits the same key. A key like f"folders:{uid}:{time.time()}" is never stable.

Pitfall 5: ctx.ai in a panel handler

Calling ctx.ai.complete(...) from inside a panel handler adds 500–3 000 ms of LLM inference to what should be a fast UI render. Panel handlers should fetch data and build UI nodes — they should not call LLMs.

If you need LLM-generated content in a panel, pre-generate it in a @ext.schedule or @ext.skeleton handler and cache the result. The panel reads from cache.

Pitfall 6: large UINode trees without pagination

Returning a ui.List with 500 ui.ListItem nodes is slow to serialize, slow to transfer, and slow to render in the browser. The platform does not enforce a node count limit — it is your responsibility to paginate.

As a rule of thumb: keep panel responses under 100 items. Use on_end_reached for infinite scroll or page_size for explicit pagination.


See also

On this page