More powerful Cosmos DB persistence

Written by Bob Langley, Daniel Marbach, and Aleksandr Samila on June 21, 2022

The key to a successful Cosmos DB system is its data partitioning strategy. Like the rows of shrubs in a hedge maze, the logical partitions that divide data must be carefully planned, because that affects the scalability of the system and defines the boundaries for logical transactions.

In version 1.1 of our CosmosDB persistence package, we’ve made defining the partition key for each message processed by NServiceBus much more straightforward, without needing a custom pipeline behavior. We’ve also added pessimistic concurrency support for more reliable processing of sagas with high contention patterns.

Let’s take a closer look at these two new features.

🔗Partition configuration API

We made it a lot easier to specify the container partition to use for each message, which is essential to make Cosmos DB transactions work.

NServiceBus uses Cosmos DB transactions to keep NServiceBus outbox and saga data consistent with whatever business data you modify in your message handlers. Cosmos DB supports transactions through its TransactionalBatch API in the .NET SDK, and NServiceBus gives you access to the TransactionalBatch so that you can use it for your business data.

There’s just one catch: all the operations in the transaction must take place in the same partition within a container. So, for each incoming message, you must tell NServiceBus which container partition to use so that the NServiceBus data and your business data can be stored together.

Previously, specifying the partition key required implementing a custom pipeline behavior to provide the information needed for the transaction. A pipeline behavior is an advanced NServiceBus API, which is very powerful, ¹ and there are a lot of good reasons to use one, but you shouldn’t have to create one just to use Cosmos DB.

We made this process more straightforward with a new transaction information API that allows you to provide NServiceBus with the necessary information without poking under the hood.

Here are a few examples of how to use the new API:

// Get the configuration objects we need
var persistence = endpointConfiguration.UsePersistence<CosmosDBPersistence>();
var transactionsInfo = persistence.TransactionInformation();

// The partition to use is always located in a message header
transactionsInfo.ExtractPartitionKeyFromHeader("PartitionKeyHeader");

// OR you can use multiple headers
transactionsInfo.ExtractPartitionKeyFromHeaders(headers => new PartitionKey(…));

// OR get the partition key from the message
transactionsInfo.ExtractPartitionKeyFromMessage<MyMessage>(message => new PartitionKey(message.PartitionKey));

// OR use a custom class that implements IPartitionKeyFromHeadersExtractor
transactionsInfo.ExtractPartitionKeyFromHeaders(new CustomPartitionKeyFromHeadersExtractor());

There are a lot of options to cover a variety of use cases, all of which are much easier than defining your own NServiceBus pipeline behavior. Check out the documentation for more API options for advanced scenarios.

This is a much easier way to configure NServiceBus to use your tenant-per-container or tenant-per-partition scheme. Even if you aren’t building multi-tenant systems, the new configuration API makes it easier to align your NServiceBus processing with your chosen partitioning scheme. No more tinkering with the internals of NServiceBus. ²

To learn more about building multi-tenant systems with NServiceBus and Cosmos DB and how to design your data partitioning strategy to fit your requirements, check out our recent webinar:

Watch Building multi-tenant systems using NServiceBus and Cosmos DB now

🔗Pessimistic concurrency support

One of the most powerful features of an NServiceBus saga is how it handles multiple messages trying to modify the same data simultaneously. No matter what, the saga will ensure that two concurrent messages can’t make conflicting changes to the stored saga data that would result in a corrupted state.

However, how the saga controls access impacts the system performance and cost to run the system under certain conditions.

The original version of Cosmos DB persistence supported only optimistic concurrency. In this strategy, message handlers for multiple messages can start processing concurrently, but the first one to commit their changes wins. When other message handlers try to commit, they get a concurrency exception (because the underlying data has changed) and are forced to retry.

This works well for sagas with little or no contention, and the performance is good. From the Cosmos DB perspective, this is also the cheapest option because you don’t have to perform any database operations (which cost money) to determine if it’s safe to proceed.

However, some sagas, such as those that implement the scatter-gather pattern, have much higher contention, and that’s when optimistic concurrency starts to break down. Many competing messages cause many concurrency exceptions to be thrown when the first message commits, resulting in floods of retries that increase the overall load, decrease message throughput, and may result in many failed messages in the error queue. ³

For sagas with high contention, pessimistic concurrency is a better approach. In this mode, we don’t try to process the message until a lock has been acquired so that we’re sure when starting the message handler that we’ll be able to commit the changes later. Every other message that needs access to the same saga data must wait until the lock is released. Then, it can obtain a new lock and proceed with processing.

This method results in fewer failures and eases contention, especially in scatter-gather scenarios, but comes at a cost. Because Cosmos DB charges for each storage operation, there is increased cost associated with checking for and obtaining the lock before a message is processed. Additionally, sagas normally unaffected by contention issues will now process more slowly due to the extra locking behavior.

Because of the extra cost associated with pessimistic concurrency, it’s not enabled by default. To enable it:

var persistence = endpointConfiguration.UsePersistence<CosmosDBPersistence>();
persistence.Sagas().UsePessimisticLocking();

We recommend only enabling pessimistic locking in endpoints that contain sagas prone to contention issues. All other endpoints can use the default optimistic locking strategy.

Check out the Cosmos DB persistence documentation page for saga concurrency for more details on how to use and tune pessimistic locking to get the best out of your endpoints with high-contention sagas.

🔗Summary

With Cosmos DB persistence version 1.1, it’s even easier to create a Cosmos DB system, align it to your partitioning scheme, and then manage its performance.

To learn more about Cosmos DB and NServiceBus, check out our Cosmos DB persistence documentation. If you’re currently using Azure Table Storage in your system, check out how to migrate from Azure Table storage to Cosmos DB. We’ve also got several code samples showing how to use Cosmos DB with NServiceBus.

Share on Twitter

About the authors

Bob Langley is a developer at Particular Software with years of NServiceBus and Azure experience, which he loves to share, but he can't unless they are both in the same partition.

Daniel is an optimistic software engineer at Particular Software with a pessimistic attitude towards wasting precious garbage collection cycles.

Aleks Samila is a software engineer at Particular Software. He's the backend mechanic who keeps everything at Particular running smoothly, whether they like it or not.

Maybe a little too powerful in this case, as there's a risk that the behavior for identifying the Cosmos DB partition could break the outbox feature.
Unless you want to.
For more details, see Optimizations to scatter-gather sagas.