Deliver with AI

Working with AI agents

An AI agent uses tools and data to carry out a task with limited human steering. Building one is new ground at Defra, so this page shares what teams have learned so far.

How agents handle identity and data

An agent never holds credentials or secrets. Identity is managed outside the agent, and only data is passed to it during a task, never keys.

An agent can only reach data the signed-in user is allowed to see. Access is enforced by the user's own identity.

A user without permission to a SharePoint site, for example, cannot retrieve its content through an agent, and a user with permission can. This is least-privilege by design.

Conversation state can be held for a single session, or kept for longer if you configure it. Decide what you actually need before you build.

Connecting an agent to data and tools

An agent is only useful if it can reach external data or tools. There is more than one way to connect it.

Model Context Protocol (MCP). A standard way to expose tools and data to an agent. See Model Context Protocol.
Direct function calls. The agent calls a function you define.
Structured prompts or payloads. You pass the agent the data it needs in a fixed shape.
Code execution. The agent runs code, for example a Python tool, to fetch or process data.

You do not always need MCP. Pick the simplest approach that does the job.

Whatever you choose, the data rules still apply. Check Using data with AI and Keeping data safe before an agent touches real content.

An agent that can act on its own output raises the stakes, so keep a human approval step before anything writes to a system. See Security.

Check your agent against expected answers

Test an agent with evaluations: structured tests that compare its output to answers your subject matter experts agree are correct. A large language model (LLM) can act as the judge, and you run the tests across many cases to set a baseline.

Two kinds are worth running:

Answer correctness. Is the output right, compared to the expected answer?
Behaviour validation. Did the agent use the right tools and data to get there?

Evaluations let you see whether a change improved the agent or quietly broke something else. Without them, you are guessing.

See what your agent is doing

When an agent gives a wrong answer, you need to see how it got there. Agent platforms provide logs and traces that show each step the agent took.

Use them to find where it went wrong, then improve the prompt, the tools or the design. Treat this as a normal part of building, not an afterthought.

Before you pick a platform

The AI Capability and Enablement team (AICE) is evaluating agent platforms, including AWS AgentCore and Microsoft Foundry, alongside extensions to the Core Delivery Platform. No platform is settled yet, and what is available changes often.

Talk to AICE before you commit to one, and we will tell you what is working today and what is not yet ready.

Ask AICE about agents

Talk to us before you build an agent. We can share what other Defra teams have learned.

Email: AICapabilityAndEnablement@defra.gov.uk

Cookies on the Defra digital service manual