Imperal Docs
SDK Reference

Pydantic feedback loop

How v4.1.0 closes the runtime arg-quality hallucination class

The Pydantic feedback loop is the runtime quality guarantee added in SDK v4.1.0. When the LLM emits a tool_use with arguments that fail Pydantic validation, the SDK gives the LLM a structured second chance — instead of failing the call outright with a VALIDATION_MISSING_FIELD.

This closes roughly 75% of arg-quality hallucinations observed in production traffic — the largest single hallucination class in the 2026-04-30 audit.

What it closes

ClassSeverityAfter v4.1.0
LLM omits required Pydantic fields (title, project_id)🔴 high✅ closed — bounded retry with prose feedback
LLM passes wrong-shape args (plural list vs singular string)🟠 medium✅ closed — Pydantic type errors translate to "expected X, got Y"
LLM passes ISO-incompatible date strings ("tomorrow")🟠 medium✅ closed — prose includes ISO format example
LLM emits unknown extra fields🟡 low✅ closed — extra_forbidden translates to "unknown field — remove it"
LLM hallucinates ID slugs in retry round🔴 critical✅ closed — I-AH-1 re-runs on every retry input

Architecture

LLM emits tool_use(create_task, {description: "..."})

outer for-loop in handle_message

_execute_function(retry_ctx={...})

(after pre-guards UNKNOWN_SUB_FUNCTION + I-AH-1):
  retry_count = 0; current_tu = tu
  while True:
    try:
      _model_instance = _func_def._pydantic_model(**current_tu.input)
      raw_result = await _func_def.func(ctx, **{...})
      return content                          # SUCCESS
    except PydanticValidationError as e:
      if not _retry_eligible or retry_count >= _RETRY_BUDGET:
        return validation_missing_field(...)  # exhausted
      prose = format_pydantic_for_llm(e)
      retry_resp = await client.create_message(...)
      new_tu = first_tool_use_with_same_name(retry_resp, current_tu.name)
      if check_id_shape_fabrication(new_tu.input):  # I-AH-1 on retry
        return validation_missing_field(...)
      current_tu = new_tu
      retry_count += 1

How feedback is formatted

format_pydantic_for_llm(e) translates each Pydantic error into one human-readable line:

Pydantic error typeOutput line
missing- '{loc}': required field is missing — provide a value
string_*- '{loc}': expected string, got {input_type}
int_*- '{loc}': expected integer, got {input!r}
datetime_*- '{loc}': expected ISO datetime (e.g. '2026-05-03T00:00:00'), got {input!r}
list_type- '{loc}': expected list/array, got {input_type}
extra_forbidden- '{loc}': unknown field — remove it
(other)- '{loc}': {msg} (Pydantic's own message verbatim)

The full prose includes a header and a retry instruction so the LLM understands what to fix.

Federal invariants (5 new)

Federal invariants are runtime contracts that block PR merge if weakened.

  • I-PYDANTIC-RETRY-BUDGET — at most _RETRY_BUDGET = 2 retries per tool_use. Beyond that → existing failure path with VALIDATION_MISSING_FIELD.

  • I-PYDANTIC-RETRY-SCOPE — retry triggers ONLY on pydantic.ValidationError. MUST NOT retry on FABRICATED_ID_SHAPE, UNKNOWN_SUB_FUNCTION, generic Exception, or TaskCancelled.

  • I-PYDANTIC-FEEDBACK-STRUCTURED — feedback is structured prose generated from e.errors(), not raw JSON or freeform text.

  • I-PYDANTIC-FC-SINGLE-APPEND — each logical tool_use produces exactly ONE entry in _functions_called, regardless of retry count.

  • I-PYDANTIC-WIRE-FROZEN — retry feature does NOT add new fields to FunctionCall, FunctionCallModel, or ChatResult.to_dict(). Observability lives in SigNoz log-derived metrics only.

Observability

Each retry emits a structured log line that SigNoz turns into a metric:

validation_retry_outcome tool=<name> ext=<name> outcome=<value> retry_count=<N>
OutcomeMeaningLog level
no_retryFirst attempt succeededDEBUG
successRetry produced valid argsINFO
redundantLLM repeated the same wrong argsWARNING
exhaustedHit retry budget without successWARNING
llm_gave_upLLM stopped emitting tool_useINFO
fabricated_id_on_retryI-AH-1 caught fabrication on retry inputWARNING (security alert at >0)

What you need to do as an extension author

For Pydantic-typed @chat.function handlers, nothing. The retry loop activates automatically.

For legacy **kwargs handlers, the retry layer is a no-op. Migrate to typed parameters to opt in:

from pydantic import BaseModel, Field

class CreateTaskParams(BaseModel):
    title: str = Field(description="Task title")
    project_id: str = Field(description="Project UUID")
    due_date: str | None = Field(None, description="ISO datetime, e.g. 2026-06-15T09:00:00")

@chat.function(
    description="Create a task in a project.",
)
async def create_task(ctx, params: CreateTaskParams):
    # ...

Pydantic models for @chat.function must be defined at module scope. Function-local models silently disable the retry loop because auto-detection runs via func.__globals__. See the federal feedback memo on this for context.

Cost

Sonnet 4.6 retry call ≈ 700 input + 150 output tokens — about $0.0044 per retry. At 7-day baseline rate (12 rejected/24h × max 2 retries × $0.0044), worst-case additional spend is **$3/month** per fleet. Negligible compared to baseline chain LLM cost.

Cross-references

On this page