1. The question a client answered for us
We didn't realise we'd built a pattern instead of a feature until a client told us. The global energy & engineering workforce firm we shipped a seven-agent timesheet pipeline to came back this quarter asking us to apply the same shape to two more domains — invoice processing and contractor onboarding — that had nothing to do with the original build. They weren't asking for the timesheet product. They were asking for the shape underneath it.
That request put a name on something we had been circling internally for a year. Three separate systems built at HONO — contractor timesheet automation, payroll anomaly explanation, and a customer support helpdesk now in development — kept ending at the same architecture, even though they were started by different squads, in different quarters, for different users. Nobody decreed a standard. The shape kept winning on its own.
This article names the shape: what it is in the abstract, what it looks like in each of the three domains, which parts never change, which parts are pure configuration — and the honest story of how badly we got it wrong the first time.
2. The shape, in abstract
Seven stages. Work arrives, gets identified, gets cleared, gets read, gets checked, gets acted on, gets recorded.
Ingest — and never describe it as email-first.Work enters on four channels: email with attachments (a candidate's onboarding documents, a client's invoice or timesheet), SFTP document drops from third-party systems on their own schedules, webhook triggers from integrated platforms, and scheduled API pulls for sources that only answer when polled. Any pipeline that assumes email as the front door breaks the day a client's system of record starts dropping files at 2 a.m. instead.
Classify. What is this, what type, which category — and therefore which section of the pipeline owns it. The trigger decision for everything downstream.
Secure. An explicit stage, not a detail: attachment validity and safety checks before anything opens the payload. Most agentic-pipeline write-ups skip this stage entirely, which tells you they haven't run one against inboxes the public can reach. Nothing downstream touches a file the gate hasn't cleared.
Extract. The AI identifies the content and pulls the fields the use case needs — hours and rates from a timesheet, pay-head movements from a payroll run, the actual question buried in a support ticket. Extraction is per-use-case by design; the machinery around it is not.
Validate + confidence. Correctness is double-checked and a confidence level is attached to every claim the pipeline makes. The gate question is asked explicitly, on every item: is this okay to be actioned by AI?
Route. High confidence: call the API, write the tables, act. Failed validation: templated return to the source. Anything in between goes to a human. Ambiguity never defaults to the model.
Audit. Every decision — action, escalation, refusal — logged with PII scrubbed at write time. A stage in its own right, because the trail is what converts a stream of model decisions into a system an enterprise can trust.
1 · Ingest
Multi-channelWork arrives on four channels
Email + attachmentsSFTP document dropsWebhook triggersScheduled API pulls
Never email-first. Third-party systems drop files on schedules; integrated platforms fire webhooks; some sources only answer when polled.
2 · Classify
What is this, and which section of the pipeline owns it?
Document type, domain category, sender context — the trigger decision for everything downstream.
3 · Secure
GateAttachment validity and safety — before anything opens it
An explicit stage, not a detail buried in ingestion. Nothing downstream touches a payload the gate hasn't cleared.
4 · Extract
Identify the content, pull the fields the use case needs
Extraction is per-use-case: hours and rates for a timesheet, pay-head movements for a payslip, the actual question inside a ticket.
5 · Validate + confidence
GateIs this okay to be actioned by AI?
Correctness double-checked, a confidence level attached to every claim. The gate question is asked explicitly, on every item.
6 · Route
DecisionAct, escalate, or return
High confidence → call the API or write the tables. Fails validation → templated return to the source. Anything ambiguous → a human. Ambiguity never defaults to the model.
7 · Audit
Always-onEvery decision logged, PII scrubbed at write time
A stage, not a log line. The trail is what turns a stream of model decisions into a system an enterprise can trust.
The same seven stages, three domains
Contractor timesheets
Production- Classify
- Which client, which country format, which timesheet type
- Auto-execute means
- Approve and post the timesheet
- Where the human sits
- Exception review — only what confidence routing flags
Payslip explanations
Production- Classify
- Which pay heads moved outside the employee's baseline
- Auto-execute means
- Publish the plain-English explanation on the payslip
- Where the human sits
- Async — the employee acknowledges, nobody approves
Support helpdesk
In development- Classify
- Ticket category, urgency, which knowledge base owns the answer
- Auto-execute means
- Send the drafted reply
- Where the human sits
- Review-every — mandatory agent sign-off until trust is earned
The stages are invariant. What each stage means — the classifier vocabulary, the validation engine, what auto-execute does, where the human sits — is configuration per domain.
3. Domain one — contractor timesheets
The first build, covered in full in Seven agents, one timesheet: contractor timesheets for a workforce firm operating in 45+ countries, arriving as email attachments in every format a client's back office can invent.
Mapped onto the seven stages — classify decides which client, which country format, which timesheet type. Extract is OCR with rotation correction, then structured parsing of hours and rates. Validation is deterministic where it can be (contractor resolution is a database lookup, not an LLM guess) and rule-based where it must be, with a confidence score attached at each step. Route is three-way: auto-approve, manual review, or reject with a templated reply to the sender. The human sits at exception review — they see only what confidence routing flags, with a projected $1M+ in annual savings coming precisely from how much never needs them.
4. Domain two — explaining payslips
The Intelligent Payslip layer inside Zero Touch Payroll answers a different question — not is this document valid but why did this number change. Anomaly detection runs on per-employee statistical baselines; when a pay head moves outside its band, the pipeline generates a plain-English explanation of the variance and publishes it on the payslip itself.
Same seven stages, different meanings. Ingest is continuous payroll input, not documents. Classify picks out which pay heads moved. Extract-and-reason works against a time-series baseline instead of an image. And the human-in-the-loop is asymmetric: the explanation appears, the employee acknowledges it — nobody approves it before it ships. Auto-execute here means publish, and the safety case is different: a wrong explanation is embarrassing and correctable; a wrong timesheet approval moves money.
5. Domain three — the helpdesk, in development
The third build is a customer support helpdesk — in development, not in production, and worth including precisely because of how it is going: it has been the fastest of the three to stand up, which is the entire point of this article.
Tickets and support emails come in; classify assigns category and urgency; extract isolates the actual question and the entities it touches; validation scores the drafted answer against a vector-similarity knowledge base; route sends the draft to a support agent who must sign off on every reply — review-every, the most conservative human placement in the family, because the system hasn't earned trust yet. The dial exists to be turned later, exception-only, once the audit trail justifies it. The same pipeline that needed a year of production hardening in the timesheet domain arrived here as configuration.
6. What never changes — five invariants
Across all three domains, five things have survived every transfer untouched. These are the pattern.
- Confidence attaches to every claim. Not to the pipeline run — to each extracted field, each validation, each drafted action. The question is this okay to be actioned by AI? is asked explicitly, per item, every time.
- Ambiguity defaults to a human.The route stage has no “probably fine” path. If the confidence doesn't clear the bar, a person decides — in every domain, at whatever position the human holds there.
- The security gate sits before extraction. Attachments are validated and safety-checked before anything opens them. This stage has never been relaxed for any domain, including the ones where the sources are internal.
- Queue-based orchestration, replayable per stage. Every stage reads from a queue and writes to one. A failed extraction retries without re-ingesting; a corrected validation replays without re-extracting. Debugging a pipeline is reading its queues.
- Audit is a stage, not a log line.Every decision recorded, PII scrubbed at write time, in the same trail shape across domains — so one security review, and one auditor's education, covers all of them.
7. What changes — five axes of configuration
Everything else varies — and the discipline is that it varies as configuration, not as forked code.
- Where the human sits. Review-every (helpdesk today), exception-only (timesheets), async-acknowledge (payslips). The same dial, three positions.
- What auto-execute means.Approve and post. Publish an explanation. Send a reply. The verb changes; the gate in front of it doesn't.
- The classifier vocabulary. Client-and-format for timesheets, pay-head movement for payroll, category-and-urgency for support. Categories are domain property; the classify stage is not.
- The validation engine. Business rules and rate codes; statistical baselines; policy checks against a knowledge base. Three engines behind one contract: a verdict and a confidence.
- The shape of the knowledge.Relational lookups for timesheets, per-employee time-series for payroll, vector similarity for the helpdesk. The pipeline doesn't care what the knowledge looks like — only that the validate stage can lean on it.
8. What we got wrong — the honest version
The clean seven-stage story above is written with hindsight. Here is what actually happened.
When we started the timesheet build, we went entirely by what the team asked for. Requirements came in from the people processing timesheets; we built exactly those. It was fast, it made the users happy, and it shipped. It was also, we would later admit, a custom project wearing a platform's costume. Nobody had built it as a framework, because nobody had asked for a framework. I should have. That one is on me.
Then two things arrived in the same season. The invoice automation conversations started — same client, entirely different document type. And new countries came onto the timesheet pipeline with formats, and client-specific layouts, the original build had never seen. Neither fit. Not because the architecture was wrong — because it had never been asked to be an architecture.
The fix was not a rewrite. It was a stage-by-stage interrogation of the pipeline: for each step, either make it configurable, enhance it to cover the new cases, or rebuild it as multiple configurable pipeline features in LangGraph. Two concepts that are now core to the framework were added in exactly this pass. Model configuration and model routing became explicit steps — retrofitted, not designed in. And human-in-the-loop was elevated from an operational detail — a review queue the timesheet build happened to have — to a first-class design principle. Invoice processing forced that one: when the payload is money, you do not let the model assume which action it should take. A hallucinated timesheet field gets caught by a reviewer; a hallucinated choice about what an invoice means compounds downstream. The human stopped being a queue and became an architectural position every domain must consciously choose.
The lesson we took: the pattern was not designed. It was extracted— from a bespoke system, under the pressure of the second and third domains, one configurable stage at a time. The framework is what's left after you subtract the domain. You only find out what's left when the next domain arrives.
9. Why this pays — three operational wins
New domains ship faster. The third system was faster to build than the first by a wide margin — the helpdesk inherited ingestion, security, confidence routing, and audit as configuration rather than engineering. The cost of a new domain is converging on the cost of its extraction logic and its validation engine, which is where the domain actually lives.
Safety is uniform. One security-gate implementation, one audit-trail shape, one confidence contract. A security review of the pattern covers every domain running on it; a compliance auditor who has read one trail has read them all.
Engineers move without re-learning. An engineer who has shipped a timesheet stage can pick up a helpdesk stage the same week — the vocabulary (ingest, classify, secure, extract, validate, route, audit) is shared, and only the domain meaning changes. For a 40+ engineering organisation spread across many client engagements, that mobility is worth as much as the shipping speed.
10. What's next — the fourth and fifth domains
The next two domains are not our idea, which is the most interesting fact about them. Invoice processing and contractor onboarding are both client-requested — the same firm that received the first pipeline asking for the shape again, on work the original build was never meant to touch. Onboarding, in particular, stretches the pattern in a new direction: long-running, multi-document, compliance-heavy journeys where the pipeline runs for days per case rather than seconds per document.
And the honest boundary: not every problem fits this shape. The pipeline is for document-and-event-driven work — things that arrive, get understood, and get actioned. Conversational surfaces like Zero UI are a different animal with different routing problems. Forcing interactive work into a pipeline shape would be the same mistake as the original custom build, made in the opposite direction.
11. The open question we haven't solved
When the timesheet domain gets a better extraction approach, does the payroll domain inherit it automatically? Today the honest answer is: not always. A shared framework with per-domain configuration creates a new tension — every improvement shipped in one domain is either forced on all of them (and now a helpdesk release is gated on payroll regression testing) or adopted selectively (and now the domains drift, which is the beginning of the three-forks problem we just escaped). Versioning the framework and letting domains upgrade deliberately is where we've landed for now. It is a compromise, not an answer, and we expect to write about it again when it breaks.
The third system was faster to build than the first. The fourth and fifth are someone else's idea now. That's how you know you stopped shipping features.
The pattern got named the day a client asked for it by shape rather than by product. Everything before that was three teams independently discovering the same seven stages — and one expensive lesson about the difference between building what was asked and building what the asking implied.