SEPT 24 · DEPLOYMENT · CASE STUDY

AI-Assisted Email Triage, UK Government, 2024

Contributed to £400m UK Government digital transformation programme.

Role: Lead Product Designer & Facilitator
Context: Part of a £400m UK Government digital transformation programme
Focus: AI-assisted email triage and draft generation at scale. This project has been white labelled.

Overview

Government staff were manually reviewing, researching, and responding to thousands of complex public enquiries every week. Each response required interpreting technical policy documents, cross-referencing internal systems, and maintaining strict consistency with official guidance.

Response times were slow, quality varied by individual expertise, and the only available solution was more headcount.I led the design of an AI-assisted system to change that without removing human accountability from the process.

Project Hero Image. 2024

The Problem

Three systemic constraints made this workflow unsustainable at scale.

Staff were manually reviewing, researching, and responding to thousands of complex public enquiries every week. Each response required interpreting technical policy documents, cross-referencing internal systems, and maintaining strict consistency with official guidance. Response times were slow, quality varied by individual expertise, and the only available solution was more headcount.

Volume pressure: Thousands of emails per week with no viable path to keeping up without proportional staffing increases.
‍Policy complexity: Responses required navigating evolving technical and regulatory documentation that no individual could hold in full.
‍Consistency requirements: Tone, language, and accuracy had to align with official policy at all times, regardless of who was writing the response.

Staff needed to understand why a draft had been generated, and to see which documents it was grounded in, edit confidently without losing compliance alignment. Retaining clear ownership and transparency of every communication that left the organisation was essential.

Workshop & Design Thinking

I facilitated a two day work with the government team. We wanted to understand the current process and where failure and challenge occurred, the As-Is for email writing, policy understanding and how they would want AI to support their individual roles. Tone of voice and clarity of answer was a persistent theme throughout, attempting to answer: how do I keep the outputs consistent, governable and factual?

Facilitating the workshop.

What We Built

A system that augmented expertise without replacing it

End-to-end workflow
‍Ingest: Emails arrive and are automatically parsed through system
‍Classify: Intent and policy domain identified, queue prioritised
‍Retrieve: Relevant policy sections surfaced with confidence signals
‍Draft: RAG-grounded response generated with source traceability
‍Review & Send: Human approval required. Full edit control preserved

Prompt Evaluation & Policy Scoring

Intelligent email classification. Incoming emails were automatically categorised by intent and policy domain, enabling prioritisation and routing without manual triage. Staff saw organised queues rather than undifferentiated volume.

Retrieval-augmented draft generation. Users could upload policy documentation, technical guides, and reference links. The system retrieved relevant sections and used them to generate draft responses grounded in official sources. Each draft surfaced the referenced documents, relevant policy excerpts, and confidence indicators — making the reasoning traceable rather than opaque.

Human-in-the-loop review workflow. Drafts required explicit user approval before sending. The interface preserved full edit control, maintained clear separation between generated and human-written content, and captured feedback to improve future responses. The system augmented expertise. It did not replace it.

Impact

Measurable change across every dimension that mattered
Hours → Minutes: Response times for standard queries reduced dramatically
‍Volume scaled: Teams handled significantly increased volume without proportional staffing increases.
‍Consistency up: Policy alignment improved across all domains and staff members

What This Taught Me

The hardest design problems in AI are not about the model

Working at the scale and accountability requirements of government made one thing clear that is easy to obscure in faster-moving product contexts: accuracy and accountability are not the same constraint, and when they conflict, accountability wins. A system can produce correct outputs and still be unusable if the people responsible for those outputs cannot explain, defend, or take ownership of them.

This matters beyond government. It matters for any deployment of AI in contexts where decisions have consequences — where an output isn't just consumed but acted upon, attributed, potentially audited. The question 'is this correct?' is necessary but not sufficient. The operative question is 'can the person sending this stand behind it, and do they have everything they need to do so?' Those are design questions as much as model questions, and they require different answers. The system we built was structured around this distinction. Confidence indicators, inline citations, visible source attribution, mandatory human review before sending — these weren't features added for comfort. They were the conditions that made the system usable at all in this context. Strip any one of them and you don't have a slightly worse product; you have a product that the organisation cannot responsibly deploy.

On this page

Overview The Problem Workshop & Design Thinking What We Built Impact What This Taught Me Results Interaction Problems Recursive Generation Scaling Adoption

On this page

Intro The Challenge What Broke in the First MVP Solution Results

2025/26 · Product · Case Study

Flint

Built and scaled a vibe coding platform to 500+ users in 12 weeks