Data models for partial update messages

3 min readFeb 3, 2021

During a recent code review, I had to think about alternative solutions to the data model of a simple message processing flow.

For the sake of this article, let’s assume that we’re talking about some webshop customer data: address and payment method.

A system of record contains this data and publishes events every time that a section of it is updated.

So, every time that a customer updates their address, a CustomerAddressChanged message is posted. Likewise, a CustomerPaymentMethodChanged message is posted when the customer edits their payment options.

Message models are already defined by the system of record. The payload of each message is the data relevant to the event and the unique customer identifier, so consumers can relate the event data to the customer profile.

The piece of software under review needs to process both events.

If I were the one writing the message processing code, I’d use the following structure.

The reason would be that this model implements the exact data model of the system of record events, thus allowing future readers of the code to quickly understand the event model without requiring further documentation. Because this code belongs to an adapter layer, there is no problem in having the model of an external system there — that’s what an adapter is for.

However, my colleague chose the following implementation instead:

He argued that because the updates were related to the same conceptual entity — the customer — it made more sense to share the full entity model amongst notifications. Then some fields would be “optional” or “empty” depending on the context.

It is a defendable position. True, the entity does contain both fields from a holistic perspective, and we have materialized that in the internal data model for this application.

My concern was that, given the model as it was proposed, it’s possible to call both CustomerAddressChanged::getCustomer()::getAddress() and CustomerPaymentMethodChanged::getCustomer()::getAddress(). However, within the context of incoming events, only the former will yield valid data whereas the latter will always be invalid.

The responsibility to know which piece of data is valid in which context was thus transmitted to the user of the model. It is not expressed in the model itself and it becomes extremely anemic.

If we want to keep the single Customer definition, a compromise can be reached. One possibility would be to make it fully private and expose the relevant public getters in the notification classes.

This way the model would make it clear what can be expected to be present in each context, and user code could no longer access fields that are always invalid for a specific message.

It is not the exact model of the incoming events. But it does not matter much because the intent of each event and the data they carry are now clearly expressed.

The model as depicted is still anemic. We have also introduced message validation in the model implementation, so the model knows how to validate itself. Less responsibility to the code using the model, which only needs to know how to deal with validation errors. This is not shown in the diagrams.

My conclusion is that, as always, there are several ways to solve a problem. Sometimes a compromise between two different perspectives can produce a perfectly appropriate solution.

Will I use the latter pattern as a starting point for similar needs in the future? No. I still like my first option better. However, now we’re talking about personal preferences. These are always debatable…

Data models for partial update messages

Written by Vasco Veloso