Pydantic feedback loop
How v4.1.0 closes the runtime arg-quality hallucination class
The Pydantic feedback loop is the runtime quality guarantee added in SDK v4.1.0. When the LLM emits a tool_use with arguments that fail Pydantic validation, the SDK gives the LLM a structured second chance — instead of failing the call outright with a VALIDATION_MISSING_FIELD.
This closes roughly 75% of arg-quality hallucinations observed in production traffic — the largest single hallucination class in the 2026-04-30 audit.
What it closes
| Class | Severity | After v4.1.0 |
|---|---|---|
LLM omits required Pydantic fields (title, project_id) | 🔴 high | ✅ closed — bounded retry with prose feedback |
| LLM passes wrong-shape args (plural list vs singular string) | 🟠 medium | ✅ closed — Pydantic type errors translate to "expected X, got Y" |
| LLM passes ISO-incompatible date strings ("tomorrow") | 🟠 medium | ✅ closed — prose includes ISO format example |
| LLM emits unknown extra fields | 🟡 low | ✅ closed — extra_forbidden translates to "unknown field — remove it" |
| LLM hallucinates ID slugs in retry round | 🔴 critical | ✅ closed — I-AH-1 re-runs on every retry input |
Architecture
LLM emits tool_use(create_task, {description: "..."})
↓
outer for-loop in handle_message
↓
_execute_function(retry_ctx={...})
↓
(after pre-guards UNKNOWN_SUB_FUNCTION + I-AH-1):
retry_count = 0; current_tu = tu
while True:
try:
_model_instance = _func_def._pydantic_model(**current_tu.input)
raw_result = await _func_def.func(ctx, **{...})
return content # SUCCESS
except PydanticValidationError as e:
if not _retry_eligible or retry_count >= _RETRY_BUDGET:
return validation_missing_field(...) # exhausted
prose = format_pydantic_for_llm(e)
retry_resp = await client.create_message(...)
new_tu = first_tool_use_with_same_name(retry_resp, current_tu.name)
if check_id_shape_fabrication(new_tu.input): # I-AH-1 on retry
return validation_missing_field(...)
current_tu = new_tu
retry_count += 1How feedback is formatted
format_pydantic_for_llm(e) translates each Pydantic error into one human-readable line:
| Pydantic error type | Output line |
|---|---|
missing | - '{loc}': required field is missing — provide a value |
string_* | - '{loc}': expected string, got {input_type} |
int_* | - '{loc}': expected integer, got {input!r} |
datetime_* | - '{loc}': expected ISO datetime (e.g. '2026-05-03T00:00:00'), got {input!r} |
list_type | - '{loc}': expected list/array, got {input_type} |
extra_forbidden | - '{loc}': unknown field — remove it |
| (other) | - '{loc}': {msg} (Pydantic's own message verbatim) |
The full prose includes a header and a retry instruction so the LLM understands what to fix.
Federal invariants (5 new)
Federal invariants are runtime contracts that block PR merge if weakened.
-
I-PYDANTIC-RETRY-BUDGET — at most
_RETRY_BUDGET = 2retries pertool_use. Beyond that → existing failure path withVALIDATION_MISSING_FIELD. -
I-PYDANTIC-RETRY-SCOPE — retry triggers ONLY on
pydantic.ValidationError. MUST NOT retry onFABRICATED_ID_SHAPE,UNKNOWN_SUB_FUNCTION, genericException, orTaskCancelled. -
I-PYDANTIC-FEEDBACK-STRUCTURED — feedback is structured prose generated from
e.errors(), not raw JSON or freeform text. -
I-PYDANTIC-FC-SINGLE-APPEND — each logical
tool_useproduces exactly ONE entry in_functions_called, regardless of retry count. -
I-PYDANTIC-WIRE-FROZEN — retry feature does NOT add new fields to
FunctionCall,FunctionCallModel, orChatResult.to_dict(). Observability lives in SigNoz log-derived metrics only.
Observability
Each retry emits a structured log line that SigNoz turns into a metric:
validation_retry_outcome tool=<name> ext=<name> outcome=<value> retry_count=<N>| Outcome | Meaning | Log level |
|---|---|---|
no_retry | First attempt succeeded | DEBUG |
success | Retry produced valid args | INFO |
redundant | LLM repeated the same wrong args | WARNING |
exhausted | Hit retry budget without success | WARNING |
llm_gave_up | LLM stopped emitting tool_use | INFO |
fabricated_id_on_retry | I-AH-1 caught fabrication on retry input | WARNING (security alert at >0) |
What you need to do as an extension author
For Pydantic-typed @chat.function handlers, nothing. The retry loop activates automatically.
For legacy **kwargs handlers, the retry layer is a no-op. Migrate to typed parameters to opt in:
from pydantic import BaseModel, Field
class CreateTaskParams(BaseModel):
title: str = Field(description="Task title")
project_id: str = Field(description="Project UUID")
due_date: str | None = Field(None, description="ISO datetime, e.g. 2026-06-15T09:00:00")
@chat.function(
description="Create a task in a project.",
)
async def create_task(ctx, params: CreateTaskParams):
# ...Pydantic models for @chat.function must be defined at module scope. Function-local models silently disable the retry loop because auto-detection runs via func.__globals__. See the federal feedback memo on this for context.
Cost
Sonnet 4.6 retry call ≈ 700 input + 150 output tokens — about $0.0044 per retry. At 7-day baseline rate (12 rejected/24h × max 2 retries × $0.0044), worst-case additional spend is **$3/month** per fleet. Negligible compared to baseline chain LLM cost.