Long-running AI calls — background task pattern

Run a long AI generation in the background with ctx.background_task: chat unblocks instantly with an ack and the result is delivered later as a fresh bot turn.

When your extension needs to call an external AI service (OpenAI, Anthropic, retrieval-augmented pipelines) that takes longer than the 30-second ctx.http default, you have three options on Imperal Cloud — from simplest to most flexible.

TL;DR — pick by duration

Op duration	Easiest form	Chat experience
≤ 30s	`ctx.http.post(url)` (default)	Chat blocks until result.
30–180s, single HTTP call	`ctx.http.post(url, timeout=120)` per-call override	Chat blocks; fine for predictable single-call latency.
Any duration up to 30 min, want chat unblocked	`@chat.function(background=True)` sugar (v4.2.13+)	Chat unblocks instantly with ack; auto-delivers result as fresh bot turn.
Same, want custom ack summary or conditional dispatch	`ctx.background_task(coro)` explicit (v4.2.12+)	Same as above; you control the immediate ack.

Pattern A — `ctx.http(..., timeout=N)` (≤ 180s)

Use timeout=N on the specific HTTP call. Chat stays in "thinking…" for the duration. Cleanest when one external call is the only slow step.

handlers_refine.py — Pattern A

from imperal_sdk import ActionResult
from pydantic import BaseModel, Field

# Assumes `chat` is imported from your extension's app module —
# typically: from .app import chat   (or from app import chat)

class RefineParams(BaseModel):
    input: str = Field(description="Text to refine")
    max_length: int = Field(default=1000)

@chat.function(
    "refine",
    description="Refine the given text via AI completion.",
    action_type="write",
    event="text_refined",
)
async def refine(ctx, params: RefineParams) -> ActionResult:
    api_key = await ctx.secrets.get("openai_api_key")
    if not api_key:
        return ActionResult.error("OpenAI key not connected", retryable=False)

    # Per-call timeout — federal cap 180s.
    resp = await ctx.http.post(
        "https://api.openai.com/v1/chat/completions",
        json={
            "model": "gpt-4o",
            "messages": [
                {"role": "system", "content": "You refine prose. Be concise."},
                {"role": "user", "content": params.input},
            ],
            "max_tokens": params.max_length,
        },
        headers={"Authorization": f"Bearer {api_key}"},
        timeout=120,
    )

    if resp.status_code != 200:
        return ActionResult.error(f"OpenAI returned {resp.status_code}", retryable=True)

    text = resp.body["choices"][0]["message"]["content"]
    return ActionResult.success(
        summary="Refined text ready!",
        data={"text": text},
    )

The user sees Refined text ready! as the chat response once the coroutine completes.

Pattern B — `@chat.function(background=True)` decorator sugar (v4.2.13+, recommended)

The single-flag form. Add background=True to the decorator and the SDK auto-wraps your handler in ctx.background_task() under the hood. No inner _work() coroutine. No manual task_id plumbing. Use this whenever the entire handler body is the long work and you're happy with the platform's auto-generated acknowledgement.

handlers_refine.py — Pattern B (sugar)

from imperal_sdk import ActionResult
from pydantic import BaseModel, Field

class StartRefinementParams(BaseModel):
    input: str

@chat.function(
    "refine_output",
    description="Refine the given text via AI completion (long-running).",
    action_type="write",
    event="text_refined",
    background=True,        # ← auto-wrap in ctx.background_task
    long_running=False,     # False → 180s cap; True → 1800s cap (federal)
)
async def refine_output(ctx, params: StartRefinementParams) -> ActionResult:
    """The body runs DETACHED. Progress emissions and the final
    ActionResult are auto-delivered to chat by the platform."""
    api_key = await ctx.secrets.get("openai_api_key")
    if not api_key:
        return ActionResult.error("OpenAI key not connected", retryable=False)

    await ctx.progress(15, "Fetching context")
    # ... optional retrieval call ...

    await ctx.progress(45, "Generating with AI")
    resp = await ctx.http.post(
        "https://api.openai.com/v1/chat/completions",
        json={"model": "gpt-4o", "messages": [...], "max_tokens": 4000},
        headers={"Authorization": f"Bearer {api_key}"},
        timeout=120,
    )

    await ctx.progress(90, "Saving")
    text = resp.body["choices"][0]["message"]["content"]
    await ctx.store.set("last_refined", text)

    # This ActionResult is what the user sees when the background work finishes.
    return ActionResult.success(
        summary="Refined output ready! 🎉",
        data={"text": text},
    )

User chat experience:

User: "улучши этот текст"
Bot (instant, auto-generated): "Started 'refine_output' in background — the result will be sent to chat when it finishes."
...~90 seconds later, the platform injects a fresh bot turn from your handler's return value...
Bot: "Refined output ready! 🎉" (data carries the refined text)

The user can keep typing about other things between turn 1 and turn 4.

Pattern C — `ctx.background_task(coro)` explicit (v4.2.12+)

Use the explicit form when you need any of:

Custom acknowledgement summary in turn 1 (Pattern B's auto-generated summary is fine for most cases but isn't configurable).
Conditional dispatch — sometimes you want to run synchronously, sometimes spawn a background task. Make the decision at runtime.
Mixed sync + background — first part of the handler runs immediately, second part detaches.

handlers_refine.py — Pattern C (explicit)

from imperal_sdk import ActionResult
from pydantic import BaseModel

class StartParams(BaseModel):
    input: str
    fast_path: bool = False

@chat.function(
    "start_refinement",
    description="Refine text; chooses fast or background path at runtime.",
    action_type="write",
    event="refinement_started",
)
async def start_refinement(ctx, params: StartParams) -> ActionResult:
    api_key = await ctx.secrets.get("openai_api_key")
    if not api_key:
        return ActionResult.error("OpenAI key not connected", retryable=False)

    # Fast path — runs synchronously.
    if params.fast_path:
        resp = await ctx.http.post(
            "https://api.openai.com/v1/chat/completions",
            json={"model": "gpt-4o-mini", "messages": [...], "max_tokens": 500},
            headers={"Authorization": f"Bearer {api_key}"},
            timeout=30,
        )
        return ActionResult.success(
            summary="Quick refine done.",
            data={"text": resp.body["choices"][0]["message"]["content"]},
        )

    # Slow path — spawn detached.
    async def _work():
        await ctx.progress(50, "Generating with AI")
        resp = await ctx.http.post(
            "https://api.openai.com/v1/chat/completions",
            json={"model": "gpt-4o", "messages": [...], "max_tokens": 4000},
            headers={"Authorization": f"Bearer {api_key}"},
            timeout=120,
        )
        text = resp.body["choices"][0]["message"]["content"]
        await ctx.store.set("last_refined", text)
        return ActionResult.success(
            summary="Refined output ready! 🎉",
            data={"text": text},
        )

    task_id = await ctx.background_task(
        _work(),
        long_running=False,         # < 180s; True raises cap to 1800s
        name="AI refinement",
    )

    # YOUR custom acknowledgement — sent to chat immediately.
    return ActionResult.success(
        summary="Got it — refining (≈90s). I'll send the result here.",
        data={"task_id": task_id},
    )

Guarantees you get by using any of these

What the platform enforces for long-running work

180-second per-call HTTP cap — a per-call timeout= larger than 180 seconds raises ValueError.
Background handlers must return ActionResult — your handler (sugar or explicit coro) must return ActionResult. Returning anything else is recorded as a failure and the user receives a fallback error message.
Tasks are user-scoped — every background task is bound to your extension and the user who triggered it at creation; cross-user access is rejected.
Results land in the right chat — the delivered result message arrives in that user's chat, never another user's.
Every delivery is audited — each delivered message is recorded in the audit trail.

Progress emissions — keep them flowing

The web-kernel uses ctx.progress(...) calls as heartbeats. If your coroutine goes silent for too long the platform may reclaim the task. Emit at every coarse milestone (fetching, generating, saving):

await ctx.progress(15, "Fetching context")
# ... some work ...
await ctx.progress(45, "Generating with AI")
# ... more work ...
await ctx.progress(90, "Saving")

ctx.progress() also raises TaskCancelled if the user cancels — let it propagate; the platform handles the cancellation delivery to chat.

When not to use a background task

Reads under 1 second — overhead isn't worth it. Just return ActionResult synchronously.
Pure CPU work — the platform's heartbeat-based liveness check assumes the coroutine yields. Wrap CPU-bound chunks with await asyncio.sleep(0) between iterations.
Streaming partial output to chat — ctx.background_task delivers a single final result. For per-token streaming, build the streaming directly into your handler (different problem; out of scope for this recipe).