Federico Iezzi, Google Cloud,
model servinggcpagentsinference
This is useful because it connects agent adoption to inference architecture. Agents do not make one call; they fan out across planning, retrieval, tool use, retries, and evaluation, which changes the serving math quickly.
Worth keeping as a scale reference for Google Cloud AI work. The product question is where inference cost becomes a feature constraint rather than a backend detail.