If you can’t trace it, you can’t debug it.
A single user request can trigger multiple model calls plus multiple tool calls (APIs, DB queries, file writes). That’s a workflow, not a single response.
Without run IDs / trace IDs, it’s difficult to debug which tool call came from which model step — especially under concurrency.
To reproduce failures, teams must capture prompts, tool inputs/outputs, and key state — not just the final response.
Capture: run ID, model call inputs/outputs, tool inputs/outputs, timestamps, and the state used for retrieval. That’s the baseline for debugging.