NServiceBus Sagas, Simplified

Written by Chris Lowe and David Boike on September 2, 2016

This post is part of a series describing the improvements in NServiceBus 6.0.

In the 1960s, Shigeo Shingo, a Japanese manufacturing consultant from Saga City, Japan, pioneered the concept of poka-yoke, a Japanese term that means “mistake-proofing.” In a nutshell, poka-yoke involves saving time and cutting waste by reducing the possibility of defects. Although some number of mistakes will always occur, processes can be put in place to catch those mistakes before they turn into actual customer-facing defects.

This is a model we’ve been trying to follow with NServiceBus – not only with regards to our internal development processes, but also in our efforts to guide developers toward building message-driven systems. Through countless API design decisions over the years, we’ve been making it ever easier to use NServiceBus the right way and ever more difficult to use it wrongly. This way, developers naturally fall in to a pit of success.¹

During the development of NServiceBus 6.0 (V6), we realized that there were some common mistakes that developers tend to make when working with sagas. Just for a bit of background, a saga is an NServiceBus pattern to model long-running processes. It does this by combining multiple message handlers together with a shared “memory” retained between handling messages that are correlated together in some way. For instance, in a typical e-commerce application, a saga might bring together message handlers to respond when an order is placed and the credit card is processed so that it is only shipped when both events have been received.

In the Saga API, we found that some users were accidentally making mistakes in constructing their sagas, especially in defining how messages relate to the saga data. This was resulting in some hard-to-find bugs. By changing the API slightly in V6, we found we could prevent some of these mistakes from being made in the first place. Let’s take a look at some examples.

🔗Mapping messages to saga data

In our theoretical e-commerce example, when a message related to a specific order arrives and that message should be handled by a saga, the saga relating to that order needs to be found and invoked.

If we take the OrderId from the message, then we can find the matching saga data by querying the data store for an instance with the matching OrderId. We could even say that there’s a kind of mapping between the message’s OrderId and the saga’s OrderId – or more generally, a property on the message and a property on the saga data.

protected override void ConfigureHowToFindSaga(SagaPropertyMapper<OrderSagaData> mapper)
{
    mapper.ConfigureMapping<OrderPlaced>(message => message.OrderId)
        .ToSaga(sagaData => sagaData.OrderId);
}

Back in NServiceBus 5.0, we made this a little more discoverable by switching the method signature on the saga base class from virtual to abstract, ensuring that the compiler would complain if the method did not exist.

However, developers often need to change sagas for existing business processes, like taking additional events into account. For example, while we were previously not shipping an order until it had been both placed and charged, we might have a new requirement to wait until the order had been approved by a customer support agent as well.

To fulfill this requirement, we would add a new message handling method to the saga for the new OrderApproved event. In this situation, it’s easy to forget that you need to add a mapping for that OrderApproved message. If forgotten, the saga infrastructure would not be able to find the correct saga data, and the saga wouldn’t end up handling the message.

Now, with V6, we’re making sure you can’t forget to add the necessary mappings that assure successfully processed messages. To do that, we added a startup check to see that all the mappings exist for messages that start a saga, throwing a helpful exception right away if anything is missing.

For messages that don’t start a saga, it’s a little more complicated due to auto correlation.² But if it’s impossible to map an incoming message to a saga due to a missing mapping, we will throw a runtime exception. The unmappable message will go through automatic retries and arrive in your error queue so that once you fix the mapping issue according to the instructions in the exception, you can replay that message and pick up right where you left off.

The bottom line is this: instead of the saga behaving unexpectedly or incorrectly because of a missing mapping, an exception will provide an early warning and guidance on exactly what to do.

🔗Being `[Unique]`

One of the guarantees NServiceBus provides you with, through sagas, is the same level of consistency you’d expect from a database. Those guarantees need to hold up even when multiple messages are processed by the same saga in parallel. NServiceBus must make sure that no duplicate database records are ever created, as that would disrupt the correctness of the business process the saga was modeling.

In our example, that means the OrderId column needs to have a unique constraint on it to ensure that two different sets of data can’t be created for the same OrderId. The constraint guarantees that only one of the messages would end up creating the saga data. The other message would hit a unique constraint violation exception when it tried to create the same saga data, causing it to fail and retry. On that retry, the infrastructure would then be able to find the saga created by the first message, processing that second message as if the messages had arrived in sequence rather than in parallel.

Previous versions of NServiceBus enforced uniqueness by decorating one of the saga data properties with the [Unique] attribute. This attribute enabled the underlying data persister to create a unique constraint for the decorated property.

Unfortunately, it was far too easy to forget to do that:

public class OrderSagaData : ContainSagaData
{
    // Oops, forgot [Unique] !!
    public string OrderId { get; set; }
}

Omitting the [Unique] attribute would cause the saga data model to be created without the correct constraint. Then, if multiple messages were processed in parallel for that same saga, when each of the threads queried the saga storage and couldn’t find an existing instance, they would each go and create a new one. This would result in a defect: duplicate saga data and nondeterministic behavior.

Now, in V6, NServiceBus performs extra checks on the message mappings in the ConfigureHowToFindSaga method on endpoint startup. It ensures that all of the mappings point to a single property on the saga data class. Since this “correlation property” is used to identify the saga, by definition this property must be unique. And the saga infrastructure defaults to treating it as unique so you don’t need to put the attribute on it anymore.

🔗Auto-population

With message mappings correctly defined, a developer’s attention will turn to the logic in the message handling methods. However, one important task related to message mappings still remains in order to have a correctly-functioning saga.

In earlier versions of NServiceBus, you needed to remember to populate the saga data with information from the message in each Handle method of messages which started the saga:

// V5
public void Handle(OrderPlaced message)
{
    this.Data.OrderId = message.OrderId;
    
    // Continue with business logic
}

This is all very boilerplate and easy to forget. If omitted, Data.OrderId will be set to its default value of 0 when saved to the database, which means the saga won’t be found when querying with the OrderId later on.

However, based on the already established mappings, the saga infrastructure already knows that the OrderPlaced event should start the saga, and the OrderId value in the message must be the same as the OrderId value in the saga data. Requiring a line of code to initialize the saga data based on the value in the message shouldn’t really be necessary.

In V6, we automatically set the value of the correlation property for newly created sagas according to the mappings so that you don’t have to.

// V6
public async Task Handle(StartOrder message, IMessageHandlerContext context)
{
    // Data.OrderId has already been set. Go ahead with your business logic.
}

🔗Summary

Forgetting to properly map saga data, forgetting a [Unique] attribute, or failing to populate the saga data from the mapped messages are all simple mistakes. Unfortunately, they were mistakes that were all too easy to make for a developer with a thousand other things on their mind.

True to the concept of poka-yoke, NServiceBus 6.0 makes it practically impossible to make these common mistakes, eliminating some hard-to-diagnose issues. We can’t prevent all bugs in code, but we’ll take these off the table so you won’t have to track them down ever again.

And, in case you were wondering, the saga infrastructure is fully backwards compatible, so you can take your existing sagas and start running them in V6 right away. So go ahead. Take NServiceBus 6.0 for a spin, and see if any of the other developers on your team (wink) made any of these mistakes.

Happy coding!

🔗Footnotes

¹ Falling Into The Pit of Success by Jeff Atwood
² Auto Correlation is a feature that embeds the SagaId in messages sent from the saga, which is then reflected back to the saga when a handler processing that message replies. As a result, the reply message also carries the SagaId and does not require a property mapping in order to find the saga.

Share on Twitter