Faced with monolith to microservices. My thought process

This is essentially a list - A thought process to which I personally find useful when faced with pulling out chunks of a larger codebase. I like to think the approach is fairly practical and 'real world'.

I'm primarily targeting those difficult areas, you know, the parts of the domain which seem complex and time consuming to move. This also makes more sense in the context of multiple applications consuming the same data.

I hope you get some ideas about where to start breaking down a large codebase, assuming the decision has already been made to do so.

Each context area (potential service) would go through the phases below individually. Progress should be made steadily. You wouldn't want to tackle splitting out a large codebase all at once, embrace the structure of the company and the software the teams work with, and go from there.

Making a start (phase 1)

  1. Refactor within the monolith first to make the boundaries more distinct. This will define what the scope of the service is.
  2. Define and model the bounded context for the service internally, this is important! You want to get to a point where pulling the code out of process largely only involves sprinkling in networking.
  3. Where they appear, don't be scared to make sacrifices for the sake of portability, separation and ease of removal later down the line.
  4. Document all touch points for the data. Especially where the data is changed, but understanding how other consumers use the data might be important later on, worth documenting.
  5. Own all writes to the data, but don't worry about keeping the data exactly where it is for now. This stops us having to worry about simultaneously changing everywhere the data is read, increasing the scope of the first phase.
  6. Think about how this change can be toggled or executed in tandem temporarily. It should be as easy as possible to turn on and off in-case of issues. Consider this early, it is important, and may influence how you refactor during this phase.

Making the separation (phase 2)

  1. KISS - Keep it simple stupid. In the beginning, the goal is to address deployment, scaling and team ownership concerns, you don't also need to have the weight of adopting a bunch of new tools and technologies which could complicate the move and introduce bugs into previously battle tested functionality. If it makes sense, separate out the code to a comparable stack at first. Once service portability is gained, and only the contracts matter, stack modernisation/optimisation can happen later when you're in a much better position to manoeuvre.

  2. Every step of the way you need to consider communication failure scenarios. Because they happen. A lot. Be it transient or constant.

  3. Minimise the number of dependencies you require clients to have to consume your service. Package management is hard, and you don't want a dozen different version requirements floating around for common packages. Think about whether or not you absolutely need to include a package before depending on it.

  4. Where possible have your team own the intricacies of consuming the service. You could achieve that in many ways, a few would be:

    • Create an SDK for clients to use, which wraps your API/stream.
    • Utilize tech like gRPC or use schema based binary encoding to enforce contracts and enable code generation (such as protobuf)
    • Event bus' like AWS EventBridge which support strict schemas - To support consuming structured events which your service emits.

Post separation (phase 3)

  1. Control all data reads. Previous steps required you to control data writes only, however, to be in a position to move the data to another store all reads need to come through your service. Depending upon the domain context your service works in, this could be a big job - But at least you're tackling this issue in isolation now.

  2. (optional) Move data stores, now that your service entirely owns the reading and writing of its data, you can begin to move that to wherever fits best, without worrying about the consumers.

  3. (optional) Change/modernise your stack - Ensuring your contracts remain. As your service has been in production for some time at this point, you know much more about how it is used, maintained and you should have a better idea about the tech stack which fits your problem space best.

Other service considerations

  1. GDPR - (if applicable) As your new service will own all the writes to the data, it will also need to consider deletion according to GDPR. You probably want to come up with a common approach to handling this, but a couple of ideas might be:

    • Listen on an event stream for a GDPR request.
    • Create a secure /gdpr DELETE endpoint on your service.
  2. Observability is hard. Rather than spraying logs throughout your codebase, first consider what you want to know about a given request or workflow, note down the dashboards and visualisations you want to see before putting the logging/metric statements in. If a workflow fails, think about the data you need in order to correlate the logs together and diagnose the issue. Consider using correlation tokens through multiple requests to tie whole journeys together and adding context globally to all logs e.g.user uuid, client details (resolved from an API key perhaps) or RequestId to name a few. Tracing becomes even more important (and challenging) if your stack involves multiple separate processes such as AWS lambda functions - Having the ability to quickly pin down an issue over a distributed system is both difficult and absolutely worthwhile.