Back

Architecture · Long-form

Two migrations

Rewriting an HRMS from PHP, and what came after.

Published · May 2026~12 min read

1. The PHP we inherited

When I joined HONO in May 2023, the platform was a PHP monolith that had carried the company for years. It had earned its scars. Three hundred-plus enterprise clients. Payroll runs across a dozen countries. Recruitment cycles, attendance, leave, the entire HR transactional surface. The fact that it was still in production at that scale was a testament to how well the original engineers had built it.

But the ceiling was visible. The codebase had grown faster than its conventions. New integrations cost more than they should have. Observability was patchy. The operations team knew when something was on fire, but not always why. And the most expensive form of debt was the one that did not show up on any ticket: the modern stack the rest of the industry had moved to was not here. Talent hires asked about it in the first interview round.

A monolith you cannot recruit for is a monolith with a clock on it.

2. Two migrations, not one

What people picture when you say “we rewrote our monolith” is the technical migration. Old code on one side, new code on the other, a cutover date. That part is finite. You can plan it, budget it, and ship it.

The work that is not finite is the second migration. Moving the existing client base off the old platform onto the new one. Net-new clients are easy. They sign and they land on the new stack. Existing clients have processes, configurations, integrations, training, audit trails, and a daily reliance on the system that you are about to ask them to relearn. They are not migrating code. They are migrating the way their company works.

Most “we rewrote our monolith” articles stop at the first migration. That is where the reality starts.

3. What we built

The backend is Node.js, Express, and Apollo GraphQL on Sequelize, talking to a per-tenant transactional database (MSSQL) and an optional analytics database (MySQL). Redis sits in front for sessions and queueing; Bull handles background jobs. A shared master registry maps every subdomain to its tenant's database connections; that lookup happens on the request hot path and is cached per process, with connection-level throttling so a burst from one tenant cannot starve another.

The frontend is React 18 with Apollo Client for real-time reads, Redux for the longer-lived state of payroll and recruitment workflows, and i18next for localisation across English, Spanish, French, Hindi, and a growing list of others. A bespoke design system wraps a small set of vetted Material UI primitives. The team writes against the wrapper, the wrapper enforces the design grammar, and individual screens stay short.

The deployment layer runs containers across multiple nodes with rolling updates and integrated health checks. Error telemetry goes to Sentry; structured logs go to Winston with daily rotation. Prometheus and Grafana are wired in for the next layer of metrics work. None of this is exotic. That is the point.

We chose the boring, mature pieces of the modern stack on purpose. The novelty in this product belongs above this layer, not at it.

4. Multi-tenancy, the boring part done right

HRMS data is the most sensitive operational data inside an enterprise. Salaries. Bank accounts. Tax IDs. Performance reviews. We chose database-per-tenant isolation over schema-per-tenant for the same reason auditors prefer it. A misconfigured WHERE clause cannot cross a tenant boundary that does not exist in the connection string.

It costs us more in operations. Every tenant is its own backup, its own migration window, its own monitor. The cost is worth paying. When an enterprise security review asks us how we guarantee that one client's data cannot leak into another's, the answer is shorter than the question.

The master registry that holds the tenant directory is the only shared piece. A subdomain resolves to a row in the registry, which resolves to a set of database connections. Models load without sync in production, schema changes are gated through migrations the operations team owns, and a small connection pool per process keeps the per-tenant connection cost bounded.

Before the rest, here is the picture for a single tenant in the middle of migration. The sections that follow unpack each layer.

One tenant · two application layers · one bridge in the config

Tenant

A customer in the middle of migration

Subdomain resolves to the new app's load balancer; legacy URLs resolve to the old app.

Tenant config / route map

Config plane

Which modules live on which stack — for this tenant, today

The React shell asks the new backend for the route map at session start. Modules that have moved to the new stack appear in the map and are lazy-loaded. Modules still on the old stack don't appear; the user reaches them via the legacy host.

Module

New stack

Payroll

React + Node/Apollo

Module

Legacy stack

Recruitment

PHP (still legacy)

Module

New stack

Attendance

React + Node/Apollo

Shared session

A small SSO bridge — the only place the two apps acknowledge each other

So a user moving between a new-stack module and a legacy-stack module inside the same tenant doesn't get a second login prompt. Deliberately thin.

Data layer

Shared

One database, both apps. Schema kept compatible enough that data flows correctly either way.

Database-per-tenant; a shared master registry resolves subdomain to connection at request time.

The two application layers never call each other. There is no gateway, no proxy, no compatibility shim in the code. The compatibility we paid for is at the data layer. The migration switchboard is at the config layer.

5. The route map is the migration switch

Every time a user logs in, the React shell asks the backend a question that sounds simple: which screens does this tenant have? The backend returns a route map, module names paired with the component paths that should render them. The shell lazy-loads exactly those components. If a module is not in the map, the bundle never downloads it.

This started as a feature-gating mechanism. Pay-per-module pricing meant we needed to render the modules a tenant had paid for and nothing else. Then it became the migration switch. A module that has been moved to the new stack appears in the route map. A module that is still on the legacy PHP application does not appear; the user reaches it through a separate URL on the legacy host. The new frontend does not try to render what it does not know about. The new backend does not try to serve what it does not own.

The same mechanism handles per-tenant rollout. The same customer can have Payroll on the new stack, Recruitment still on the old, Attendance moved last week, and Onboarding moving next month. The product team flips a config record, the customer's next session picks up the new map, the route resolves. There is no deploy involved.

6. The bridge is in the config, not in the code

Most teams I have spoken to about this assume there is a compatibility layer somewhere. A gateway that proxies requests between the old app and the new. An adapter that translates calls in one direction. A queue that synchronises state between them. There is not. The two application layers do not know each other exists.

The bridge lives in three places, none of them in the application code:

  1. The tenant configuration plane. Per-tenant, per-module, an admin record names which stack serves which module. The route map is computed from that record.
  2. A small shared-session token. So a user who has logged into the new app does not get a second login prompt when they cross into a legacy module inside the same tenant, and vice versa. This is the only place the two systems acknowledge each other, and it is deliberately thin.
  3. A shared underlying database. Both stacks read and write the same data. The schema is kept compatible enough that whichever application surfaces a given record, the record is correct.

The compatibility we paid for is at the data layer. The application layers are sovereign.

That choice has costs. Two teams have to think about the same data shape from two different angles. A field rename in the schema is a coordination event, not a one-sided refactor. But the alternative was a compatibility codebase that nobody would own a year in, and we have all seen what those become.

7. Migrating the platform was the easy part

Once the platform was stable and the bridge was clean, the question stopped being “did we build the new system right?” and became “can we move every existing customer onto it without breaking the way they work?”

That second question is bigger than the first. New customers in the last twelve months landed directly on the new stack. Sign the contract, point the subdomain, configure the modules, train the admins, go live. Forty-plus of them are on the new stack today, with a combined user base above a hundred thousand. The platform works.

The existing customers are the harder problem. Every one of them has years of configuration, dozens of integrations, payroll rules tuned to a country's specific compliance regime, a training history their managers rely on, and an audit trail their CFO is going to want preserved. Moving them is not a flip of a switch. It is a project, per customer.

8. The pilot

So we picked ten clients for a deliberate pilot. Varied by size, by geography, by which modules they use most. The goal was not to migrate them quickly. The goal was to migrate them carefully enough that we could see the seams.

Five are done. Five are in progress. Each migration teaches us something we did not know before. A UX expectation we had not anticipated. A configuration shape the new admin screen cannot yet express. An integration we had to write a small bridge for because the legacy hook was not replayable. Each finding becomes either a code change in the new app, a refinement to the migration runbook, or a sharpening of the trainer's playbook.

The first migration teaches you the script. The next two hundred run the script.

9. What the pilot is teaching us

Three patterns repeat across the pilot.

Configuration drift. A customer's setup on the old platform has, over years, accreted exceptions that nobody documented and that the user manuals do not mention. We are building a config-diff tool that compares old and new tenant configurations and surfaces the gaps to the migration team before the cutover, not after.

The integration surface area. Most clients have at least one homegrown integration into the HRMS. An export to their finance ERP, a payroll-day file drop, an HR-IT identity sync. The new app's API contract is cleaner than the legacy one, which means most integrations need small client-side adjustments. We consolidate those into a per-tenant integration sheet that the customer's IT team signs off on before cutover.

Training. The new app is a different application. Flows are shorter, labels are different, shortcuts that experienced users had learned have new keystrokes. Day-one productivity dips. Day-fourteen productivity recovers. That window has to be planned for, and we now plan for it explicitly. Trainer hours are staffed up for the two weeks after a cutover, then back down.

10. What comes after the pilot

Once the pilot concludes, the consolidated learnings become a scripted migration. Configuration export from the old tenant, automated diff, generated runbook for the migration manager, integration test pack, training schedule, cutover window, rollback envelope. All of it codified into something the team can run dozens of times a quarter instead of once a month.

The discipline that makes this work is unglamorous. No tenant moves until the script can guarantee a same-day cutover. No script is trusted until it has passed a dry run on the previous customer's configuration. No customer goes live on the new stack without a documented rollback path back to the old one. Every migration is reversible until the day after the customer's first successful month-end run.

If that sounds slow, it is. Slowness in a migration of this kind is a feature. The cost of a fast cutover that loses a client's trust is higher than the cost of a careful one that takes a quarter longer.

11. The lesson

If you are about to rewrite a monolith, here is what I would tell you. Assume the rewrite is one quarter of the work. The other three quarters are configuration migration, change management, and the operational pace you have to hold over the next twelve months while customers slowly move across.

Architecture is a finite problem. Customers are an unbounded one. The teams that win the long game are the ones that treat the second migration as the real one.

The platform migration was the easy part.

Everything above is in production. The pilot is in flight. Numbers and learnings will update as the rest of the book of clients moves across.

Building or scaling a multi-tenant SaaS through a rewrite? Let's talk.

Back to portfolio