Performance
How extension code affects user-perceived latency — what's hot, what's cached, what's free, what to never do
Extension handlers do not run in isolation. Each invocation is part of a larger web-kernel dispatch cycle: intent classification, workflow routing, activity execution, and SSE delivery. Understanding where your code sits in that cycle — and what costs what — is the foundation for writing extensions that feel fast.
| Topic | Section |
|---|---|
| Latency budget | Budget |
| Hot paths | Hot paths |
| Refresh model | Refresh model |
| Free operations | Free |
| Expensive operations | Expensive |
| Caching strategies | Caching |
| Skeleton patterns | Skeletons |
| Panel patterns | Panels |
| Chain patterns | Chains |
| Storage tier choice | Storage |
| Observability | Observability |
| Common pitfalls | Pitfalls |
| Cross-references | See also |
Latency budget
A typical chat-turn has a target range of 80–300 ms for the handler portion. The end-to-end latency the user perceives is higher — intent classification, workflow scheduling, and LLM inference all precede your handler — but those costs are outside your control. Your extension handler is a slice of a larger budget.
Where time goes in a complete turn:
| Stage | Typical cost | Owner |
|---|---|---|
| Intent classification (LLM) | 400–1 500 ms | Web-kernel |
| Workflow scheduling | 10–30 ms | Web-kernel |
| Extension activity dispatch | 5–20 ms | Web-kernel |
| Your handler execution | 20–300 ms | You |
| SSE / HTTP delivery | 5–20 ms | Web-kernel |
The handler portion is where your choices have the most impact. A handler that makes five sequential HTTP calls easily turns a 200 ms budget into a 1 500 ms experience.
For chain turns — where two or more tools execute in sequence — each step's handler latency multiplies. A 300 ms handler becomes 900 ms across three steps.
Hot paths vs cold paths
Not all handlers execute with the same frequency or latency expectations. Categorize yours before optimizing.
Panels — hot, frequent, should be cheap
Panel handlers execute whenever the frontend re-fetches panel content. That happens:
- On
auto_actionload (user opens the panel tab) - When
ActionResult.refresh_panelsnames the panel - On
on_event:SSE events matching the panel'srefreshdeclaration
Target: < 100 ms p50. The user is sitting in front of the UI, waiting. Every panel re-render that takes 500 ms is perceptible friction.
A sidebar panel that renders folders, stats, and a note list — like the notes extension — separates its data sources:
- Folders: cached 60 s —
ctx.cache.get_or_fetch - Folder stats: cached 30 s —
ctx.cache.get_or_fetch - Notes list: not cached — primary content, always fresh
The folders and stats are structurally stable data; caching them is correct. The notes list must be fresh; that single live HTTP call is the dominant cost in the panel handler. Two cached calls + one live call is substantially cheaper than three live calls.
Skeletons — background, periodic, should be thorough
Skeleton handlers run in the background on their configured TTL tick, not during a user-visible request. They are not on the hot path for user-perceived latency. You can afford more work — multiple HTTP calls, aggregation — because the result is pre-computed and served from Redis when the classifier needs it.
But: skeleton handlers that perform expensive work on a short TTL amplify that cost across the user population. A skeleton with ttl=10 doing a 500 ms aggregation runs six times per minute per active user.
from imperal_sdk import Extension
ext = Extension(
"tasks-ext",
display_name="Tasks",
description="Example showing skeleton TTL choice for frequently-updated data.",
actions_explicit=True,
)
# ttl=30: short because the LLM must see fresh task counts soon after writes.
# This works because the skeleton only surfaces counters + recent IDs — no
# expensive joins or aggregation.
@ext.skeleton(
"tasks",
alert=True,
ttl=30,
description="Today/overdue/upcoming counts and recent task IDs.",
)
async def skeleton_refresh_tasks(ctx) -> dict:
# Fan-out with asyncio.gather to parallelize the HTTP calls.
# All five calls execute concurrently — total wall time ≈ slowest call.
import asyncio
try:
today_raw, overdue_raw = await asyncio.gather(
ctx.http.get("/v1/tasks", params={"filter": "due_today"}),
ctx.http.get("/v1/tasks", params={"filter": "overdue"}),
)
return {
"response": {
"today_count": today_raw.json().get("total", 0),
"overdue_count": overdue_raw.json().get("total", 0),
}
}
except Exception:
return {"response": {"today_count": 0, "overdue_count": 0}}from imperal_sdk import Extension
ext = Extension(
"reports-ext",
display_name="Reports",
description="Example showing skeleton TTL choice for slow-changing data.",
actions_explicit=True,
)
# ttl=300: schema rarely changes. A 5-minute window is acceptable.
# Keeping TTL high avoids frequent expensive schema introspection.
@ext.skeleton(
"db_schema",
alert=True,
ttl=300,
description="Active database schema — tables and columns.",
)
async def skeleton_refresh_db_schema(ctx) -> dict:
try:
resp = await ctx.http.post("/v1/schema", json={"user": ctx.user.imperal_id})
tables = resp.json().get("tables", []) if resp.ok else []
return {"response": {"table_count": len(tables), "tables": tables}}
except Exception:
return {"response": {"table_count": 0, "tables": []}}Chat functions — user-blocking, must be fast
@chat.function handlers execute while the user waits for a chat response. After the intent classifier finishes (itself 400–1 500 ms), the handler runs synchronously in the activity. The user cannot do anything else during that time.
Target: < 500 ms p95 for a single step. For write operations that trigger confirmation flows, this is less visible — the confirmation card appears first, and the actual execution happens after acceptance. For read operations, the user is waiting for the answer.
In a chain, latency compounds. If your handler takes 400 ms and it is step 2 of 3, the chain takes at least 1 200 ms just for handler execution, before classification and delivery overhead.
The refresh model is broader than you think
ActionResult.refresh_panels controls which panels re-fetch after a successful handler. The semantics differ by delivery path:
| Path | Trigger | refresh_panels behavior |
|---|---|---|
| Path A — HTTP direct call | ui.Call(...) from panel | Targeted: only the named panels re-fetch |
| Path B — SSE / chat | Chat function in a message | Ignored: all discovered panels re-fetch |
On Path B (the chat path), setting refresh_panels=["sidebar"] does not limit the refresh to just the sidebar. The SSE publisher refreshes every panel the frontend has open. This is by design — the web-kernel cannot know which panels may have been affected by an action that arrived via chat.
Implication for panel design: panels must be cheap even on a no-op refresh. If your center panel takes 500 ms to render its initial state, every write operation in any extension will trigger a 500 ms panel reload. Design panels to be fast regardless of what prompted the refresh.
from imperal_sdk import ui, Extension
ext = Extension(
"notes-ext",
display_name="Notes",
description="Example showing a panel that is cheap even on no-op refresh.",
actions_explicit=True,
)
@ext.panel("viewer", slot="center", center_overlay=True, title="Note")
async def notes_viewer(ctx, note_id: str = "") -> object:
# Guard: if no note is selected, return immediately with no I/O.
# This keeps the refresh cost near-zero when the user has no active note.
if not note_id:
return ui.Empty("Select a note to view it")
# Only perform I/O when there is a selected item to load.
resp = await ctx.http.get(f"/v1/notes/{note_id}", headers={"X-User": ctx.user.imperal_id})
if not resp.ok:
return ui.Error("Could not load note")
note = resp.json()
return ui.Stack([
ui.Header(note.get("title", "Untitled")),
ui.Markdown(note.get("content", "")),
])What's free
These operations carry no meaningful performance cost. You can use them freely without profiling.
ctx.user and ctx.tenant access
Both are frozen Pydantic models injected at context construction time. Reading ctx.user.imperal_id, ctx.user.role, ctx.tenant.tenant_id, etc., is a plain attribute access with no I/O.
Reading skeleton output from cache (web-kernel side)
When the classifier reads skeleton data to build context for the LLM, it reads from Redis — a web-kernel-side operation you do not control and do not pay for in your handler.
Returning ui.Empty(...)
An empty state return from a panel handler serializes to a small JSON dict. It is the cheapest thing a panel can return and is the correct no-op pattern when there is nothing to show.
from imperal_sdk import ui, Extension
ext = Extension(
"reports-ext",
display_name="Reports",
description="Example showing cheap empty-state panel return.",
actions_explicit=True,
)
@ext.panel("detail", slot="center", center_overlay=True, title="Report Detail")
async def report_detail(ctx, report_id: str = "") -> object:
if not report_id:
# No I/O, no cost — returns immediately.
return ui.Empty("Select a report", icon="BarChart2")
# ... fetch and render the report
return ui.Stack([ui.Text("Report data here")])Building UINode trees
Constructing ui.Stack(...), ui.List(...), ui.ListItem(...), etc., is pure Python — no I/O, no serialization until the return value reaches the web-kernel. Build as many nodes as you need; the cost is CPU-proportional to tree size and negligible for typical panel output.
What's expensive
These operations involve I/O and should be minimized, parallelized, or cached.
ctx.http.* — network round trip
External HTTP calls are the most common source of panel latency. Each call adds a network round trip to an external service — typically 50–300 ms depending on the service and your infrastructure topology. A panel that makes three sequential HTTP calls adds 150–900 ms of unavoidable wait time.
Mitigations: parallelize with asyncio.gather, cache results with ctx.cache, or combine into a single batched call if the upstream API supports it.
ctx.ai.* — LLM inference
Calling ctx.ai.complete(...) from inside a handler makes a synchronous LLM inference call, typically 500–3 000 ms. For most @chat.function handlers this is the single most expensive operation available. The Pydantic feedback loop (SDK v4.1.0+) can trigger up to two additional inference calls on validation failure — up to 6 000 ms in the worst case.
Recommendation: avoid ctx.ai in panel handlers entirely. In @chat.function handlers, use it only where the LLM's reasoning is genuinely irreplaceable. For skeleton handlers, ctx.ai is more acceptable because the skeleton runs in the background.
ctx.db — raw database query
Raw database access via ctx.db is faster than HTTP but still involves a network trip to the database host plus query execution time. Simple indexed lookups are typically 5–20 ms; full-table scans or complex joins can be 100–500 ms or more.
Mitigations: ensure your queries use indexed fields, apply limit bounds, and cache results for stable data.
ctx.store.query — document store query
ctx.store.query(collection, where=..., limit=...) translates to a backend database query. Performance depends on the collection size and whether the where dict fields have backing indexes. Without selective where clauses, the backend scans all documents in the collection.
Always set a limit. The default limit is 100 documents, but large collections can make even a 100-row scan slow if there is no index.
Large UINode trees
The web-kernel serializes the UINode tree returned by your handler and sends it to the frontend. A panel returning thousands of list items produces a large JSON payload that is slow to serialize, slow to transmit, and slow to render. Pagination is the correct solution for large lists.
Caching strategies
ctx.cache.get_or_fetch — the primary pattern
get_or_fetch is the canonical caching pattern for panel handlers: check the cache, call the fetcher on miss, write the result back, return it. It handles the check-then-fetch atomicity correctly and is the pattern used in production across the notes, tasks, and mail extensions.
from __future__ import annotations
from pydantic import BaseModel
from imperal_sdk import Extension, ui
ext = Extension(
"reports-ext",
display_name="Reports",
description="Example panel handler using get_or_fetch for stable metadata.",
actions_explicit=True,
)
@ext.cache_model("account_list")
class AccountList(BaseModel):
accounts: list[dict] = []
@ext.panel("sidebar", slot="left", title="Accounts")
async def accounts_sidebar(ctx) -> object:
uid = ctx.user.imperal_id
# Stable data: account list changes rarely — cache for 120 s.
async def _load_accounts() -> AccountList:
resp = await ctx.http.get("/v1/accounts", headers={"X-User": uid})
return AccountList(accounts=resp.json().get("accounts", []) if resp.ok else [])
entry = await ctx.cache.get_or_fetch(
f"accounts:{uid}", AccountList, ttl_seconds=120, fetcher=_load_accounts,
)
items = [
ui.ListItem(id=a["id"], title=a.get("name", "Unknown"))
for a in entry.accounts
]
return ui.List(items=items) if items else ui.Empty("No accounts connected")Cache constraints to remember:
- TTL must be 5–300 seconds (SDK enforces this —
CACHE-TTL-1AST rule) - Key must be alphanumeric +
_-:, max 128 characters - Value must be a Pydantic
BaseModelsubclass - Value size is capped at 64 KB per entry
- The model class must be registered via
@ext.cache_modelbefore first use
@ext.cache_model — register before use
Every model passed to ctx.cache.get/set/get_or_fetch must be registered. The registration must happen at import time (module scope), before any handler code runs. In multi-file extensions, the registration conventionally lives in a dedicated cache_models.py that is imported before handler modules.
from pydantic import BaseModel
from imperal_sdk import Extension
ext = Extension(
"mail-ext",
display_name="Mail",
description="Example showing @ext.cache_model registration at module scope.",
actions_explicit=True,
)
# Registered at import time — before any handler imports this module.
@ext.cache_model("inbox_page")
class InboxPage(BaseModel):
messages: list[dict] = []
total: int = 0
next_cursor: str = ""
@ext.cache_model("unread_summary")
class UnreadSummary(BaseModel):
unread_count: int = 0
last_checked: str = ""When to cache vs always-fetch
| Data type | Cache? | TTL guidance |
|---|---|---|
| Account / connection list | Yes | 60–120 s |
| Folder / label list | Yes | 60 s |
| Folder stats / counts | Yes | 30–60 s |
| Schema / metadata | Yes | 120–300 s |
| Primary content list (inbox, note list) | Usually no | Freshness required |
| User-specific settings | Yes | 60–120 s |
| Results of expensive aggregation | Yes | As long as acceptable staleness allows |
Do not cache the primary content users are looking at directly. If a user adds a note and the sidebar still shows the old list 60 seconds later, that is a bug in UX terms even if it is technically correct. Cache metadata that structures the content, not the content itself.
Skeleton caching — pre-warming the cache from the background
The sql-db extension uses its skeleton handler to mirror data into the application cache. The skeleton runs in the background on a 300 s tick; any panel or chat function that needs the same data reads from the cache rather than making a live call. This pattern decouples panel latency from the upstream service response time.
from __future__ import annotations
from pydantic import BaseModel
from imperal_sdk import Extension
ext = Extension(
"db-ext",
display_name="Database",
description="Example skeleton that pre-warms the app cache for panels.",
actions_explicit=True,
)
SCHEMA_CACHE_KEY = "db_schema_snap"
SCHEMA_CACHE_TTL = 270 # slightly under skeleton ttl=300 so cache is always warm
@ext.cache_model("db_schema_snap")
class DbSchemaSnapshot(BaseModel):
tables: list[dict] = []
table_count: int = 0
@ext.skeleton("db_schema", alert=True, ttl=300,
description="Active database schema — tables and columns.")
async def skeleton_refresh_db_schema(ctx) -> dict:
try:
resp = await ctx.http.post("/v1/schema", json={"user": ctx.user.imperal_id})
tables = resp.json().get("tables", []) if resp.ok else []
compact = [{"name": t["name"], "columns": t.get("columns", [])} for t in tables]
# Pre-warm the cache so panel handlers avoid the live call.
snap = DbSchemaSnapshot(tables=compact, table_count=len(compact))
await ctx.cache.set(SCHEMA_CACHE_KEY, snap, ttl_seconds=SCHEMA_CACHE_TTL)
return {"response": {"table_count": len(compact), "tables": compact}}
except Exception:
return {"response": {"table_count": 0, "tables": []}}Skeleton patterns
TTL choice
The TTL passed to @ext.skeleton is a hint to the web-kernel about how frequently to run the background refresh tick. Choose it based on how quickly users need to see accurate data in the classifier, not on how expensive the refresh is:
| Use case | TTL guidance | Rationale |
|---|---|---|
| Task counters, unread counts | 30–60 s | LLM should see accurate counts after writes |
| Folder / project structure | 120–300 s | Changes less frequently |
| Database schema | 300 s | Schema changes are rare and deliberate |
| Email inbox summary | 60–120 s | New mail arrives continuously |
A short TTL makes the skeleton data fresher but increases the background I/O load across your user base. The tasks extension uses ttl=30 because task counts change frequently and the classifier must route accurately after each write. The sql-db extension uses ttl=300 because schema introspection is expensive and schema changes are infrequent.
Alert mode (alert=True)
When alert=True, the web-kernel compares the new skeleton output to the previous snapshot and, if they differ, emits a change notification. This is event-driven freshness: instead of polling on a fixed TTL, the system reacts to actual changes.
Use alert=True when:
- Your skeleton surfaces counts or status fields that change meaningfully
- The change has UX significance (new unread mail, new overdue tasks)
Pair alert=True with a companion tool named skeleton_alert_{section_name} that compares old and new snapshots and returns a human-readable alert string.
Auto-rotate at 500 iterations
The web-kernel automatically rotates the skeleton refresh worker at 500 iterations. This is a platform-level resource management feature — you do not need to implement it or handle it in your skeleton handler. Write skeleton handlers as idempotent, stateless functions.
Panel patterns
Pagination for large lists
Never return thousands of items in a single panel response. Use ui.List(page_size=N, on_end_reached=ui.Call(...)) to implement infinite scroll. Load the first page eagerly; load subsequent pages only when the user scrolls to the bottom.
from __future__ import annotations
from imperal_sdk import Extension, ui
ext = Extension(
"contacts-ext",
display_name="Contacts",
description="Example panel with infinite-scroll pagination for a large list.",
actions_explicit=True,
)
@ext.panel("contacts", slot="left", title="Contacts")
async def contacts_sidebar(ctx, cursor: str = "") -> object:
uid = ctx.user.imperal_id
params: dict[str, object] = {"user_id": uid, "limit": 50}
if cursor:
params["cursor"] = cursor
resp = await ctx.http.get("/v1/contacts", params=params)
if not resp.ok:
return ui.Error("Could not load contacts")
data = resp.json()
contacts = data.get("contacts", [])
next_cursor = data.get("next_cursor", "")
total = data.get("total", len(contacts))
items = [
ui.ListItem(id=c["id"], title=c.get("name", "Unknown"))
for c in contacts
]
return ui.List(
items=items,
total_items=total,
page_size=50,
on_end_reached=ui.Call("__panel__contacts", cursor=next_cursor) if next_cursor else None,
)Lazy-render with ui.Loading
For panels with expensive initial loads, return ui.Loading(...) immediately while the data fetches in the background. This makes the panel appear responsive even when the backend is slow.
In practice, because panel handlers are async, the web-kernel awaits the result before sending to the frontend. ui.Loading is most useful as a placeholder inside a ui.Stack for a section that loads independently via auto_action or a ui.Call.
Batched fetches — never serial HTTP calls
If your panel needs data from multiple endpoints, fetch them in parallel:
from __future__ import annotations
import asyncio
from imperal_sdk import Extension, ui
ext = Extension(
"analytics-ext",
display_name="Analytics",
description="Example panel batching multiple HTTP calls in parallel.",
actions_explicit=True,
)
@ext.panel("overview", slot="right", title="Overview")
async def analytics_overview(ctx) -> object:
uid = ctx.user.imperal_id
headers = {"X-User": uid}
# Parallel fetch — all three execute concurrently.
visits_resp, revenue_resp, users_resp = await asyncio.gather(
ctx.http.get("/v1/stats/visits", headers=headers),
ctx.http.get("/v1/stats/revenue", headers=headers),
ctx.http.get("/v1/stats/users", headers=headers),
)
visits = visits_resp.json().get("total", 0) if visits_resp.ok else 0
revenue = revenue_resp.json().get("amount", 0.0) if revenue_resp.ok else 0.0
active_users = users_resp.json().get("active", 0) if users_resp.ok else 0
return ui.Stack([
ui.Stats(children=[
ui.Stat(label="Visits", value=visits, icon="👁️"),
ui.Stat(label="Revenue", value=f"${revenue:,.2f}", icon="💵"),
ui.Stat(label="Active users", value=active_users, icon="👥"),
])
])Serial I/O is the most common performance problem in panel handlers. Three sequential 100 ms calls become 300 ms; three parallel calls become 100 ms.
Chain patterns
depends_on for parallel-safe steps
When a chain step does not depend on the output of a previous step, declaring depends_on=[] (or an explicit subset) lets the web-kernel's topological sorter run independent steps in parallel. Steps that are declared as read operations and have no dependency on prior write results can be scheduled concurrently.
The depends_on field belongs in the classifier's action_plans schema — it is not a decorator kwarg. The web-kernel applies Kahn's topological sort to the declared plan before dispatching steps.
Ordering guarantee: read steps are always scheduled before dependent write steps. If your classifier emits depends_on correctly, a chain like [mail.list_unread, notes.create_note(depends_on=[mail.list_unread])] will always run the read before the write, regardless of list order.
id_projection to avoid LLM re-resolution
When a chain step receives an entity ID from a prior step, id_projection tells the web-kernel which parameter field carries the target ID. The web-kernel injects the ID directly — the LLM does not need to re-resolve it. This eliminates one round of LLM inference per step that requires ID threading.
from pydantic import BaseModel
from imperal_sdk import Extension, ChatExtension, ActionResult
ext = Extension(
"folders-ext",
display_name="Folders",
description="Example showing id_projection for chain ID threading.",
actions_explicit=True,
)
chat = ChatExtension(
ext,
tool_name="tool_folders_chat",
description="AI chat interface for folder management.",
)
class DeleteFolderContentsParams(BaseModel):
folder_id: str
# id_projection="folder_id" tells the web-kernel that the "folder_id" field
# carries the entity ID for this step. In a chain, the web-kernel injects the
# resolved folder_id from the prior step without asking the LLM to re-state it.
@chat.function(
"delete_notes_from_folder",
description="Delete all notes from a specified folder by folder ID.",
action_type="destructive",
chain_callable=True,
effects=["delete:note"],
id_projection="folder_id",
)
async def fn_delete_notes_from_folder(ctx, params: DeleteFolderContentsParams) -> ActionResult:
if not params.folder_id:
return ActionResult.error("folder_id is required")
resp = await ctx.http.delete(
f"/v1/folders/{params.folder_id}/contents",
headers={"X-User": ctx.user.imperal_id},
)
if not resp.ok:
return ActionResult.error("Could not delete folder contents")
count = resp.json().get("deleted", 0)
return ActionResult.success(
data={"folder_id": params.folder_id, "deleted_count": count},
summary=f"Deleted {count} notes from folder",
refresh_panels=["sidebar"],
)Use id_projection for compound function names where the verb-prefix heuristic cannot derive the correct field name. For simple names like delete_note, the heuristic finds note_id automatically. For names like delete_notes_from_folder, you need id_projection="folder_id".
Storage tier choice
Three storage tiers are available; choosing the right one for each data type is a performance decision:
| Tier | API | Latency | TTL | Use for |
|---|---|---|---|---|
ctx.cache | get/set/get_or_fetch | < 5 ms (Redis) | 5–300 s | Short-lived, derived, computed data |
ctx.store | create/get/query/update/delete | 10–50 ms (DB) | Permanent | User-owned, persistent documents |
ctx.db | acquire/session | 10–100 ms (DB) | Permanent | Complex SQL queries, joins, migrations |
For panel handlers, ctx.cache is the correct tier for pre-warmed or aggregated data. ctx.store is the correct tier for user-owned content (notes, tasks, contacts). ctx.db is for extensions that own raw SQL schemas.
Do not use ctx.store as a cache. Store documents persist indefinitely and are not evicted by TTL. If you are storing intermediate or derived data that should expire, use ctx.cache.
For a deeper discussion of when to use each tier, see Cache vs store.
Observability
Structured logging
Use logging.getLogger(__name__) (synchronous) or await ctx.log(...) (async) for structured event logs. Both routes are visible in the extension dashboard and in the platform's log aggregator.
Tag your log lines with user-relevant context so you can filter by user or operation:
from pydantic import BaseModel
from imperal_sdk import Extension, ChatExtension, ActionResult
import logging
ext = Extension(
"reports-ext",
display_name="Reports",
description="Example showing structured logging with context tags.",
actions_explicit=True,
)
chat = ChatExtension(
ext,
tool_name="tool_reports_chat",
description="AI chat interface for reports.",
)
log = logging.getLogger(__name__)
class GenerateReportParams(BaseModel):
report_type: str
@chat.function(
"generate_report",
description="Generate a report of the specified type for the current user.",
action_type="read",
)
async def fn_generate_report(ctx, params: GenerateReportParams) -> ActionResult:
uid = ctx.user.imperal_id
log.info(
"generate_report start user=%s type=%s",
uid,
params.report_type,
)
resp = await ctx.http.post(
"/v1/reports/generate",
json={"user_id": uid, "type": params.report_type},
)
if not resp.ok:
log.warning(
"generate_report backend error user=%s status=%d",
uid,
resp.status_code,
)
return ActionResult.error("Report generation failed. Please try again.", retryable=True)
log.info("generate_report success user=%s", uid)
return ActionResult.success(
data=resp.json(),
summary=f"Report generated: {params.report_type}",
)ctx.log is async def and must be awaited. Standard logging calls are synchronous. Both are acceptable; standard logging is more common in production extensions.
Audit ledger
Every @chat.function invocation is recorded in the platform audit ledger with its action type, status, and timing. Latency hot-spots appear in the audit ledger automatically. You do not need to emit latency metrics manually.
Platform dashboards
High-level latency distributions and error rates are visible in the platform monitoring dashboard. Use these as the first signal that a handler has a latency regression. Drill into structured logs for per-invocation detail.
Common pitfalls
Pitfall 1: N+1 queries in a panel handler
The most common panel performance problem: rendering a list of items where each item requires its own HTTP call.
from imperal_sdk import Extension, ui
ext = Extension(
"tasks-ext",
display_name="Tasks",
description="Example showing N+1 query anti-pattern to avoid.",
actions_explicit=True,
)
@ext.panel("tasks", slot="left", title="Tasks")
async def tasks_sidebar_bad(ctx) -> object:
uid = ctx.user.imperal_id
tasks_resp = await ctx.http.get("/v1/tasks", params={"user_id": uid})
tasks = tasks_resp.json().get("tasks", []) if tasks_resp.ok else []
items = []
for task in tasks:
# ❌ One HTTP call per task — 50 tasks = 50 HTTP calls
detail_resp = await ctx.http.get(f"/v1/tasks/{task['id']}/detail")
detail = detail_resp.json() if detail_resp.ok else {}
items.append(ui.ListItem(
id=task["id"],
title=detail.get("title", task.get("title", "")),
))
return ui.List(items=items)Fix: fetch all detail in a single batch call, or include detail in the list endpoint response.
Pitfall 2: short-TTL skeleton with expensive aggregation
from imperal_sdk import Extension
ext = Extension(
"crm-ext",
display_name="CRM",
description="Example showing a skeleton TTL mismatch to avoid.",
actions_explicit=True,
)
# ❌ ttl=10 with a heavy aggregation query runs 6 times/minute per user.
@ext.skeleton("crm_summary", ttl=10,
description="CRM summary with full contact aggregation.")
async def skeleton_refresh_crm_bad(ctx) -> dict:
# This call takes 800 ms — running it every 10 s is 6 calls/min per user.
resp = await ctx.http.post("/v1/crm/aggregate-all", json={"user": ctx.user.imperal_id})
return {"response": resp.json() if resp.ok else {}}Fix: either make the aggregation cheaper (return counts, not full records), or use a longer TTL. If freshness is critical, use alert=True with a change-detection companion rather than a short TTL.
Pitfall 3: chain step that blocks on another extension synchronously
If your chain step calls ctx.extensions.call(app_id, ...) synchronously inside a handler, that call blocks the current activity. Long-running inter-extension calls in a chain context multiply: a 500 ms IPC call in step 2 of a 3-step chain adds 500 ms to the total chain latency.
Use ctx.extensions.call for data lookups, not for triggering side effects in other extensions. Side effects should be modeled as chain steps with their own @chat.function declarations, not hidden IPC calls.
Pitfall 4: ignoring cache hit rate
Writing a cache entry and never checking whether it is actually being hit is a common trap. If the cache key changes on every request (for example, including a timestamp or a nonce), the cache always misses and you are spending I/O on writes with no benefit.
Verify your cache keys are stable across requests for the same logical entity. A key like f"folders:{uid}" is stable — same user always hits the same key. A key like f"folders:{uid}:{time.time()}" is never stable.
Pitfall 5: ctx.ai in a panel handler
Calling ctx.ai.complete(...) from inside a panel handler adds 500–3 000 ms of LLM inference to what should be a fast UI render. Panel handlers should fetch data and build UI nodes — they should not call LLMs.
If you need LLM-generated content in a panel, pre-generate it in a @ext.schedule or @ext.skeleton handler and cache the result. The panel reads from cache.
Pitfall 6: large UINode trees without pagination
Returning a ui.List with 500 ui.ListItem nodes is slow to serialize, slow to transfer, and slow to render in the browser. The platform does not enforce a node count limit — it is your responsibility to paginate.
As a rule of thumb: keep panel responses under 100 items. Use on_end_reached for infinite scroll or page_size for explicit pagination.
See also
Cache vs store
When to use ctx.cache vs ctx.store vs ctx.db — tier semantics and TTL tradeoffs
cache_model reference
@ext.cache_model decorator — registration, model constraints, TTL rules
Skeleton reference
@ext.skeleton decorator — TTL, alert mode, return contract, auto-rotate
Chains guide
Multi-step actions, depends_on ordering, id_projection, [chain executor](/en/reference/glossary/) semantics