Exactly-once: the holy grail of distributed systems

00:00 Szymon Pobiega

Okay, let me start. Thank you all for coming here. I can't see you all because the light is in my face, but I can see that there is a lot of you. Let me first introduce, my name is Szymon Pobiega. I was running the workshop yesterday and the day before, so some of you I guess might know me. It's my pleasure to be here. It's a great pleasure. I was supposed to be here at the conference before, but then the COVID started, so I'm super happy to finally arrive here and maybe share some of my experience with you. My background is, I spent my first 10 years of career, let's say, building distributed systems for customers and then I switched and the next more or less 10 years, I spent helping others build distributed systems by developing or co-developing with my colleagues' framework for building distributed systems.

So that's my basically history of the career. You can find me on the blog that I run with my colleague Tomek, the exactly ones at GitHub.io or on LinkedIn or on GitHub. So I would like to start with sharing with you the two books that influenced that talk and that influenced my career the most. And these are the Domain-Driven Design by the author of the domain driven design idea, Eric Evans, and then the Enterprise Integration Patterns. And my career so far for the last 20 years was this blue-black combo. Part of it is focusing on the business problem, how to solve the business problem so that the businesses are successful and try to think in business terms. And then the other part, the black part is, okay, how do I run that business code so that it is reliable and is delivering what it's supposed to do?

So the first book, the blue book, the subtitle of the book is Tackling complexity at the heart of the software. What does it mean? It means that Eric forces us to think about what is the business problem that we are solving and finding the most important piece of software that we are developing and focusing on that most important part. The other thing that Eric introduces in the book, in the second part of the book that few people only read is the idea of the bounded context and that diagram that is called the context diagram. And one of the jokes that was early on circulating in the Domain-Driven Design community was that the context diagram looks like a potato diagram. So I would like to start with a potato. We can have multiple potatoes. Let me show you where that at least, if you put potatoes on a design surface, then you get a context diagram and idea is that it's something that you sketch pretty quickly.

So it looks like potatoes. It doesn't have any concrete shapes like in UML. And each potato defines a part of the software that is autonomous, that's called bounded context. And in Domain-Driven Design lingo, that bounded context is autonomous because the terms and the names that are defined are defined within the scope of that boundary. Later on, Domain-Driven Design practitioners extended this idea so that the bounded context is also autonomous in terms of deployment, in terms of hosting, but that's a long story. So let's try to imagine the simplest possible system that we can have these days. It's of course consists of multiple components because that's what responsible engineers do. One of them is billing, one of them is shipping, one of them is orders and delivery. That can be any e-commerce application. One example would be an e-commerce application that is selling pierogi for delivery.

But let's imagine it's an e-commerce application. How does it work? Let's look at those potatoes in detail. So the first one that takes orders, it has two interfaces, the web interface for the retail customers and another interface for some business-to-business communication. And then, there is another one that generates invoices and the communication between these two is asynchronous. So the first potato publishes an event and that event is delivered asynchronously to that second potato. Whoever used asynchronous communication for events? Okay, lots of hands up, that's good. Another event is published to that third potato where the order processing is done. There is a business process denoted by these arrows, funny arrows, and it does something important. What part of this something important business logic is it? It sends commands. Commands are another way of asynchronous communication to some other components within that boundary to do something. And last thing is, when we are done with processing inside our system, frequently we need to reach out to another system over the system boundary to tell that other system that they need to do something.

An example would be, we need to schedule a shipment of the package that somebody ordered and we need to do it through a HTTP API that is hosted by someone else. So to summarize, there are signals that are coming to our system from outside, from users or from other businesses that communicate with us through the system boundary on the left-hand side. Then there is our system consisting of multiple components communicating with each other asynchronously mostly. And then there is yet another system even there is boundary. We need to communicate with some other system. That's more or less how most systems are built today. The problem is this is a list of distributed system fallacies. And I must admit I had problems with that list. That list was put together around 1994 if I remember correctly, mostly by Peter Deutsch and one edition by famous James Gosling, famous because he's the author of Java.

And I have a problem because when I read the list for the first time, I thought, well some of these things look like they're not true. Bandwidth is infinite. Yeah, we have a gigabit ethernet, but it's still not infinite. But the concept that I didn't understand is the label for these eight things is the fallacies of distributed computing. And I'm from Poland, so English is not my native language and we in Polish, we don't have a term for that, for fallacy. So I had to Google for that and then something clicked, oh, that's a false or mistaken idea. So I made myself a favor and inserted not here. So that now it more or less aligns with the way I think about things. So these things are known or have known to be false for distributed computing for last 30 years now. 30 years, yes.

And what we are going to focus on in this talk today, is the network is not reliable. We're going to ignore the rest for some other talks that might happen today or in other occasions, but we're focused on the network is not reliable. So we have our distributed system that is fairly elaborate, consists of multiple components, and these components even interact with external third parties and now we have to build it in such a way that is going to work while the network is not reliable, that means that the system as a whole is only as good as the weakest component of that system. And what is the weakest component of our system? Well, let's go back for a moment to Domain-Driven Design. So the blue book, we are looking at the black side of things, now we are back to the blue side of things.

So the business stuff, Eric Evans in his book has this very important insight that you should focus on the most important part of your software first. He calls it the core domain and then the rest is supporting domain. So in our case, the core domain is that the thing that brings us money, the competitive advantage, but usually that thing that brings us money, the competitive advantage piece of code, it doesn't interact with the external world. It's something abstract. And then in order to interact with the external world, we need supporting domains. We need to interact with third parties, we need to do stuff. And in that book he prescribes that we should protect these components by isolating them from each other through the concept that is called anti-corruption layers. So we protect these, we care more about the thing that brings us money. So we protected more than the supporting sub-domain, right?

We focus our efforts, the time, the money, the brightest people on that core domain. All right, that's good. And now because it can't do anything by its own, it has to basically communicate with all the supporting sub-domains, has to send messages to external systems to third parties like user interface like web APIs. And now the problem is, well if the system is only as good as the weakest component, the weakest component here is the communication layer. So it's very important to focus while building the system on the heart of the software on the most important component, the one that brings us money. But if our system is distributed and that means that network in our system is not reliable, system is distributed, then it's also equally important to focus on tackling the complexity at the borders of the system. So heart, fine, but the borders, the borders are also important and in order to tackle that complexity at the borders of a component or at the border of a service or the border of the whole system, we need to ensure one thing.

We need to ensure that any signal that is generated by our users, by our cooperating third parties is processed by the system exactly once because if we can't ensure that we will end up duplicating data, duplicating signals. If we are in a e-commerce land, well we might deliver two pieces of food or two packages of food to our customer, but if we are selling cars, we'll deliver two Porsche's to our customers and that's probably not good. So we want to ensure that every logical signal that is sent to our system is processed exactly once and it's generated to the external third parties also exactly once so that these third parties can process it correctly, and we'll be tackling that in the stock. We'll start with the heart of that software or the core of the software, which is an asynchronous part that is communicating through commands and events.

And if some of you have not used commands and events, the idea there is that the command or event are sent through messaging infrastructures and these messaging infrastructures like message queues or message topics. And some of these technologies you might recall is for example the RabbitMQ on-premises or Azure NServiceBus, Azure Storage Queues, SQS and Amazon AWS. These technologies, the thing that they share is that they store the signal that you send to them until that signal is processed. That signal might be a message, it's usually a message, an event. So you send something to that piece of technology, it'll store them on disk or on multiple disks because these days these technologies are usually clustered and they are replicated, so they store it securely somewhere and then they are going to deliver it to the destination multiple times. They will try to deliver that message to the destination processing thing multiple times until they are acknowledged.

So the good thing about all these technologies for asynchronous communication is that they solve the lost problem. If we need to deliver the signal exactly once that means we can't deliver it zero times and we can't deliver it more than one time. Zero times means a message is lost and this is solved by these technologies that's given basically as opposed to HTTP communication where HTTP requests might get lost, they're sent at most once, so that's done. But the problem that we have here is how to ensure that they are not duplicated. So we solve the problem, they're not going to be delivered zero times, they might be delivered one time, but we really, really don't want to multiply these signals to deliver them more than once. And the advice you can find on the internet that makes me really angry and I try to be a peaceful person is, just make it idempotent.

That's just a thing and it was very popular a few years back on the internet, so okay, you have to deal with the fact that the messaging infrastructure is giving you that message multiple times. So just structure your code so that when you get it for the second time or for the third time, you will find out that has been delivered multiple times and find out how to deal with it, make it idempotent. That's not really a great piece of advice because it's not constructive, it doesn't give you any idea of how to deal with that. And the problem with that advice, the biggest problem I have with it is that it's plain wrong. The problem that the assumption of just make it idempotent is that there is one thing that your code is interacting with like a set of values and it's very easy to make the code that is interacting with one thing idempotent.

The issue here though is that any non-trivial piece of software that is in that distributed system is interacting with two resources. The problem is that if you want to build a distributed business process, it has to be a chain of code that is executing. So you send a message, there is piece of code that executes, stores something in a database and if it doesn't send another message somewhere else that's done. You don't have a distributed business process because it's just one piece of code. In order to have a distributed process, which does one thing here and one thing there and another thing here, you need to send follow-up messages. So the actual simplest possible problem you have to solve is you have a message that is coming in, you need to store something in the database on the left-hand side, you need to send a follow-up message and the database here and the message queue here are two resources and making the code idempotent that interacts with two resources, that's a big deal.

That's not an easy problem. And to make it worse, if you actually look at the diagram, how it behaves in the reality, that arrow here splits into two arrows because in order for a messaging infrastructure to be able to deliver the same message to you multiple times until you acknowledge you processed that, you need to split just receive a message into two interactions. The one is non-destructible, means hey, here's the message process it and the other is in the opposite direction, meaning I process that message, you, the messaging infrastructure can forget about it now. So you have three operations, each of them has side effects, each of them is stored somewhere endurably and you need to figure out how to structure these operations so that they seem to be executed exactly once atomically how it can look like. So we receive a message and then in order to make sure that we don't modify the data multiple times, what we can do, the simplest thing we can do is we can remember that we processed that given signal.

So we receive a message, we modify the data and we store some more information. I processed that message. The problem is though that if we store that information somewhere like in a separate database here, then we are back to square one or even worse because now you can see on the diagram we have four transactional resources. So it's even worse. So the thing that we need to do actually is to store the information that we processed the given message inside the same database in a same transaction. So now it's slightly better. We are receiving the signal from the outside, from the messaging infrastructure, then we are processing it. If we have a duplicate signal, then we are able to de-duplicate it and we are not corrupting the data, but we still have problem on the right-hand side here with that sending and the solution to solve that problem is to make that sending a thing that can be retried multiple times.

So it's not well destructive in a way. And the way to do it is we need to store the messages that we are about to send in a database and that of course is the same database as here because it all has to be done in sync. So what we did in fact here is we started by dealing with multiple transactional resources and executing operations that have side effects on this multiple transactional resources. Where we ended up here is, we are actually dealing here with just one resource that is our transactional resource that we care about because what we are doing here is basically receiving signal from outside. The signal might be duplicated, we might get multiple copies of it and then while processing it we modify the data, we remember we processed that signal and we store here and the side effects of that. So that a single resource where we store information of the state of our processing in one transaction. So we basically found one of these resources, made it a primary resource and did that trick and that pattern is called the Outbox. That's the industry's for ensuring non-duplicate processing.

The pattern is fortunately quite well described and well documented on the internet so you can read about that pattern and these days nobody would be surprised if you're using that pattern. There are some implementations, coaches about implementing the pattern that I won't get into in the details, but what I wanted to show you now is that... Or convince you now that the pattern has some issues and those issues with that pattern are related to the fact that this pattern requires that we store data. We need to store data about processed messages for well for how long. The question is that data about processed messages is fairly simple for each message we process. We just need the idea of that message, but we need to store that in our business database, which is our precious resource. Business databases or databases that are good in storing business data are not cheap to maintain.

The more information we have, the better, right? Ideally we would like to store the information that we processed each message forever because we cannot get a duplicate message after five months or after five years. I know it's quite unlikely, the duplicates are quite likely after five seconds or five milliseconds. The longer the time window, the less likely they are, but ideally we should keep them forever. Unfortunately we are not living in an unbounded reality, so we need to truncate that data. So at some points we need to forget about processing given messages because we don't have unlimited size of the database and that creates multiple problems. One problem logically is that, well the algorithm is not a hundred percent bulletproof because we are forgetting, explicitly forgetting that we processed some messages in order to fit into the given space in a database. The other problem is that truncating this data, it's problematic in different technologies, it might be a problem because how do you clean up database?

In a relational database, cleaning up tables, removing data from tables in bulk is not that easy and usually involves some technology-specific tricks like dropping partitions. Also, it's interesting, where do you put the boundary? Do I keep the data for a week, a day or a month? Those are questions that need to be answered and that bothers me a bit. So I think we can do better as an industry. We can try to think harder and here's some inspirations that we can have to think harder. Who have ever seen that screen? A bunch of hands, that's MSMQ you probably if you have a Windows machine, Windows laptop or Windows desktop, you probably have it on your machine or at least have the opportunity to install it through the Windows Component management, it's a message queuing technology that is there. I think since Windows XP or Windows 2000s, it's quite old technology, one of the oldest, it's an equivalent of NotePad or the old paint.

It's fairly reliable technology and it's distributed messaging technology. It means that you have MSMQ installed on your machine and each machine has its own MSMQ service. And when you want to send a message, a command or an event from your machine to a different machine, then you first send it to a local MSMQ service, it's going to persist it on your disk. Then it's going to try to forward it to an MSMQ service on a distant machine and then persist it in the MSMQ database on that distant machine and only after it's persisted there, it's going to deliver it to the actual code that is running there. So it's a distributed message queue, requires multiple machines and the way it works, that's what I described. The thing is I've been using MSMQ for 15 years now as a user, then as a developer, and I never seen, never ever in my career, had a duplicate message in MSMQ.

It also is fairly stable when it comes to the amount of disk space it consumes. I've never encountered MSMQ running out of space claiming, oh, I need more deduplication space. So what's going on there? It clearly has to have a very good deduplication algorithm that makes sure that no message is lost and duplicated. The problem is though that the code is closed, Microsoft is not going to share the source code of MSMQ and the documentation is really sparse, so we can't really figure out how it works. What we can figure out is there is an open standard called AMQP-1.0, and it might contain some hints on how MSMQ works internally. The AMQP standard contains a description of the interaction called exactly once, which is conveniently something that is related to the topic of that talk and that specification says that in order to achieve exactly one's delivery in the forwarding scheme like this one, you need to have confirmations both ways, which is kind of un-intuitive at the first site, but I'll try to explain it so that it becomes intuitive.

So first of all, you send a message over that network here, right? It can fail but eventually it's going to succeed. You send a message here and store it in a database and then you need to confirm to the other, to the sender, hey, I received that message, right? Because otherwise if you don't confirm it, then the sender will try to resend that message over and over again until it succeeds. That's fine. And at that point, this party has information about that message because it receives it and it confirmed its receival and this party still had the information about that message because it was prepared to retry processing it. And what it does later is it drops the information about that given message, erases it and it calls this guy and says, "Hey, I forgot about this message. I promise to never ever deliver that message again because I don't have that message in my database anymore."

At that point, the receiver can actually drop its information for de-duplicating that message because the sender promised to not redeliver that message again. And that I think is the secret why MSMQ service is so reliable that it almost never fails. But that's indirect information. The other inspiration that I think might be useful is the fries pattern. And I know that there are among the speakers, there are French citizens and also Belgians, so I would not disclose if these are specific fries. So just, I care about my life. So that's the fries pattern, also known as state-based deduplication. The idea is that if you have a state machine like preparing fries and that state machine has no cycles, you cannot go back, also doesn't have the arrows pointing in a different direction. There is no cycle here. If I peel the potato, it's peeled. I can't un-peel it the same.

If I cut the potato, I can't uncut it. If I have a state machine in my business logic like this one, I can use it for de-duplicating messages. Imagine my message is peel and I found that this message called peel arrives when my state of my potato is peeled. I already know that this message has been processed because the state indicates so. So I can use that pattern to fairly reliably, well, very reliably actually deduplicate my messages. We will find it useful in a second or in a minute. Okay, so let's get back to the drawing board. And that is the Outbox pattern as we left it before we went on that detour to look at how MSMQ and how peeling potatoes work. It looks like that and we can visualize it in a different way and that visualization will help us understand what is the purpose, actual purpose of the Outbox pattern.

Here I show two transactions that kind of raise between each other to try to process a message. So imagine there are two copies of the message, exact copies, and we have a multi-threaded system that picked up, one thread picked up one copy of the message, the other picked up the second copy of the message. These are the same message. So in order for the message to be processed exactly once one of these threads has to win and the other has to lose and back up. So they start by beginning the transaction, then they insert the Outbox record, the information, the Outbox record is the information that says I processed that message and the resulting outgoing messages are such and such, right? They insert it, but the transaction number two finishes earlier. It is for some reason this one got delayed slightly. So one of them commits the state.

The other fails to commit. It fails to commit because there is a primary key constraint on that Outbox table. So I can only insert an Outbox record for a given incoming message once because the primary key constraint on a database prevents me from inserting a second copy. So this one wins. This one ends with a primary key violation exception or an equivalent in your favorite database and basically has to drop that message. The result in the Outbox side of things is once we execute the insert and commit, well, sorry, once we execute the insert, we have a record in a database that contains incoming message ID and as I mentioned, the outgoing messages. Then we commit the transaction and then we can take these outgoing messages after committing it and actually send them to their destinations, one by one going to the message queue and giving these messages to the message queue.

And once we are done with that, we can execute the mark operation. That means I mark this Outbox record as fully processed. I modify the data and I dispatch the messages to the destinations. So there's nothing else to do. The only purpose of that Outbox record it will be serving now, is for de-duplicating, the future invocations. So when I mark this record as processed, I can remove the information about processed messages. That's why I try to draw a slightly thinner line here because that size of the Outbox record drops from kilobytes to bytes because that's just the message ID now. And what you can see here is that the Outbox serves two purposes depending on when we are in that system. The first purpose is it serves is mutual exclusion. We want only one thread to be able to successfully transition from one state to the other.

And the state that is the most important piece of the Outbox is, have I processed the message? So that's here, deduplication. The most important part is for each message we need to keep a Boolean flag that says, have I processed that message? So that's the state. The mutual exclusion is there to make sure that only one thread is able to flip that processed flag, and that is the same thread that is storing the data in a database and generating the outgoing messages. So these two purposes, we can attempt to do something with them, to untangle them into two separate concerns and I'll show you how we can do it. So as I mentioned, there is this Boolean flag that says, have I processed the message? And currently it's represented in such a way that if I have not processed the message, then the flag is not represented, it's not materialized.

There is no stored information that I have not processed the message. Only when I flip the flag and I say I have processed the message, then it materializes in the form of the Outbox record. But I can as well flip that around and say, I materialized the information that I have not yet processed the message and the information that I have processed is not materialized, it's kind of inferred. So let me show you in a different way. That is a life cycle of a message, the message and for some period of time, it doesn't exist. It's zeroed. There is no message. Then it starts to exist basically as a message that is sent. So we sent a message and it hops into existence and exists for some brief period of time, maybe seconds, maybe minutes, maybe hours at most. And then it is in the process state for the rest of the eternity until the heat death of the universe, that message is going to be processed and nothing is going to change that.

So that's the life cycle of the message, a subtle life of the message. It's like butterfly. It's exists as a thing only for a very short period of time. And the way we represent it in the classic Outbox pattern is that if the message doesn't exist, so from minus eternity from the Big Bang until now, we don't represent it in any way. Then we send a message and for the duration of seconds to minutes, we don't represent its processing state, it is in the queue, but our database does not know about that message. So still nothing there. But then when we flip to the processed state, so the message has been processed by our Outbox enabled endpoint for the duration of eternity until the heat death of the universe. We need to keep the representation, that Boolean flag set to true and that's the root cause of the problems of the Outbox pattern is we need to store a very tiny piece of information, just the ID for eternity.

That means that we need infinite amount of information for the Outbox pattern to work, but we can flip that very easily because it's just a Boolean flag. The good thing about Boolean flag is that on average we just need half the beat to represent it because we can take advantage of the fact that we can decide which state of the Boolean flag true or false is actually materialized. The other might not be existing. So here what we say is, when the message has not been sent, the same thing, it doesn't exist in any database, but when we send the message, then it starts to exist in a database. It occupies the space because we need to keep the information that it still exists and still has not been processed for minutes, seconds, maybe hours. And then when we process the message, we forget about it. So we flip the flag again, but we don't need to keep that information.

So the very big difference here is that we keep the information for minutes, hours, not for eternity. And the way it works is again, that diagram here, we have the mutual exclusion side of the Outbox here, and the deduplication means that the state exists until the mark phase here. Here at this point we can forget about that state because we processed the message. How do we do it in practice? The difference here though is we need to synchronize or we need cooperation between what is sending the message and receiving the message. Previously with the Outbox pattern, everything was done by the party that is receiving messages. It flipped the flag between non-processed and processed. Now when we change that behavior, that flag has to be flipped in two separate places. The sending party has to flip the flag, the message is not yet processed, so it materializes in some store, let's call it the token store.

We have a token for processing the message and then the receiver, when it processes the message, it flips the flag again, deleting the token or claiming and deleting the token. And now the message is forgotten by both sender and the receiver. Similarly to what probably MSMQ does. The other way of looking at it is that the sender needs to create that token at most once, we have to create it at most once, because we don't want to recreate tokens for processing messages that could cause duplicate processing down the line. And then we have at least once here to send a message because we want to reliably deliver that message. We don't want to lose it. And then the receiver is going to combine these two and combine them with the fries pattern to claim that token exactly once. The fries pattern will allow us to claim the token, to flip the flag so that we ensure that the flipping of the flag is done atomically.

We need to use that pattern here because the flag, the Boolean flag is a remote flag. It's not something that we have in our process that we can just flip. It's something that is stored in a remote store and it's stored remotely because both the sender and the receiver need to be able to access it. On the sequence diagram, it looks like we have a token here for processing a message. That token has been created prior to that interaction by the sender. It's there for the duration of the transaction. Well, basically the transaction for processing is going to take a bit longer, but while that transaction is active, we consume that token for processing this message and we create a token for processing the follow-up message.

Then we send the follow-up message, and at this point we can delete the transaction, we can delete the token earlier and then the transaction. So we forget the first token, we create the new one, we send the new message and the same interaction can be repeated in the next component and next component and the next component and the amount of states that each of these components has to maintain is proportional to the number of messages that are in flight between the sender and the receiver. So it's bounded, we can bound it to megabytes, maybe gigabytes, but for sure it's bounded, which is very, very good trait in that compared to the Outbox pattern.

But there is no free lunch, right? So there must be some negative consequences here, and there are two. One is that the patterns requires more IO because we are creating these tokens, deleting these tokens, each token processing on average requires one more operation to be done than compared to the Outbox pattern. The other, well not problem, the other consequence of that pattern is that the sender and the receiver need to agree on the protocol, hey, you need to create tokens in order for me to process the messages. While in the Outbox pattern, the sender could be ignorant of what the receiver is doing here. They need to agree on the protocol and they need to agree on where to store these tokens. So there is more coupling between the sender and the receiver. I'll skip over that slide because we are tight on time and that slide basically showed what are the failure modes of that pattern.

But I want to share with you some two other patterns that are related to this one that will allow us to achieve something really much bigger than what we're able to do so far. Let's look at our system in this left-hand side of things where the users interact with that system. So a customer places an order. When a customer places an order, the interaction looks slightly differently. We have a user interface here on the top and the user wants to save a data in a database. And at the same time, we probably want to send a message to some back-end components so that let's say the user places the order, we store the order in a database and we send a message for someone else to process that order. That's a very common interaction on the user interface site. And again, we want to ensure that these two things happen atomically.

We don't want to store the order but not send a message to process it. We don't want to have the opposite effect of sending a message to process a non-existing order. So what we can do is we can use the similar approach to the Outbox pattern. We can store both the data and the messages here in that database so that the messages are then pushed out and everything is good. But that requires the Outbox pattern, requires something to drive it, to drive it when it fails. So we store the data and suppose the web request fails and now something has to take that messages from a database and push them out to their destinations. That's the driver of the Outbox pattern. And there might be multiple drivers, at least three. The one that we've seen so far was in the messaging context. The most common driver for the Outbox pattern is the incoming message.

When we are processing messages, we always have a message that is incoming and that the message queue is going to retry processing over and over. So it can be used to drive the pushing out of the outgoing messages. Here in the context of the user interface, we don't have that incoming message, so we can't just directly use it to drive the Outbox pattern. Some implementations of the pattern rely on a background thread that is polling a database asking, hey, are there any messages to send out? Are there messages to send out? And if there are any messages, it pushes them out. The problem with that approach though is that any store that is partitioned like Cosmos DB or Dynamo DB, these storages are for these storages that queries to drive that Outbox pattern are very expensive because they need to be done across partitions. So that's not a good solution.

So what can we do else? What we can do is we can introduce a queue into that system in front of that interaction and add a user interface behind that queue. So that means that I will be taking the HTTP request from the user and immediately putting it into the queue so that now I have that message that can drive my Outbox pattern and ensure that messages are pushed out when the data is stored. The problem though with that pattern is that now everything is asynchronous. Before that, I had a really nice synchronous conversation with the user. The user clicked here, they stored the data and they immediately could refresh their browser and see the data modified. Now I introduced a delay and an asynchrony here, the message can be immediately or almost immediately processed. So milliseconds, the message can be here for milliseconds, but if something is stuck, the message can be here for seconds.

And that's not ideal for the user. So that's not something we would like to implement. The ideal solution would be that the user interface is directly changing the database and storing the messages it would like to send in a database. And then there is a message here on that queue, a tiny little control message that just say, oh, I want to push messages from transaction X out to the external systems. And that pattern is called transactional session pattern. The other name of that pattern is atomic save and send, the pattern is I guess two years old, not even two years old. So I have no idea which name is going to stick. So just be aware of the two names and I would like to show you how that works in practice by putting an alphabet soup here. So these letters denote the order of things happening.

We know that we first need to store the messages to the database and the data that we want to change before we can push out messages. So for sure D comes before E. And we for sure want to first send that tiny little control message here so that the processing that tiny little control message, I grab the messages from a database and push them out. So C, D, E, that's pretty obvious. Now the question is what should go first here? Should I first send a message to drive the Outbox or should I first store the data? And there is one good answer to it, and it's not first store data, which is kind of not intuitive. If I first store the messages in the data in a database and then decided to send a message to drive the Outbox to push these messages out, I can risk a failure mode where I stored the data, I stored the messages that need to be sent out, but I failed to send this message because my web processing thread has been terminated.

And in that situation, I have all front messages here and nothing is going to push them out. So the correct order is I first send that tiny little message to drive the Outbox, and then I store the data and the messages. That's the ID of the Outbox pattern with correct labels here. Sorry, not Outbox, but transactional session. The problem though with that approach is you probably realize, what happens if that transaction is just very, very slow. So I sent the message to drive the Outbox here it has arrived here and wants to push these messages out it really wants. So it asks the database, hey, database, do you have the messages for that transaction? Nope. Hey, database, do you have these messages? Really, I need them. I want to push them out. Nope. Okay, so at this point I up, we need to give up because we can't reprocess that message.

Again, it's like spinning here and consuming resources. So we need to drop it on the floor and then this transaction wakes up and says, oh, I'm here. I have the data. I want to, I store that data in a database. Hey, is there someone to push these messages out to the third parties? And there's nobody to push these messages, right? We just drop that message here to the floor. So we can't do that. The solution is, sounds a bit gross, but it's to put a thing called tombstone in that database. So we want to have a mutual exclusion here. It's either I managed to store the data in a database and there exists this driver message that is pushing my Outbox messages out. Or there is a tombstone here that will prevent that transaction from succeeding. So if that message was dropped on the floor before it's dropped on the floor, it has to block that transaction so it never succeeds because otherwise it would create those orphaned messages.

The result of that pattern is that this transaction can fail, the user interface, transaction can fail, but we can deal with that. We can show the user nice error message. We can try to retry through a poly or similar technology. We can deal with these problems because they can always happen with a database that can fail. But at least we get that guarantee that is in the name of the pattern is atomic send and update or atomic publish and update, which means that either I atomically modify the database and my messages go out, or neither of these is true. And of course these messages here, we can use the token-based duplication pattern that I described previously to ensure that these messages that go out here, they will be processed exactly once. That deals with that aspect of the system. But what about third parties? Our APIs that we expose for other software to access us, the network is not reliable here.

So we need to make sure that something is de-duplicating these messages. Third party client of our software will retry calling our API until it succeeds because they really want to communicate with us. So we need to deduplicate their requests. We cannot do it based on any state because most often there is no state to base our deduplication at, they just send us a request and we need to deal with that. We need to figure out if it's duplicate or not. So what we can do is, the only thing we can do is, try to deduplicate based on some ID, a request ID. And in order to do that, we need to force these clients of our API to provide these IDs to us so we can duplicate based on this. So it sounds pretty simple, it's about the contract between us and these third parties.

But you can flip that side and say, oh, we are the third party to someone else. If we are forcing our third parties to provide us these request IDs when they communicate with us, when we are communicating with external web APIs, we need to provide these IDs as well. So let's look at how that happens. How can we call a third party web service, a rest service or HTTP service from a context of handling a message, a command or an event? And this diagram, this alphabet soup shows something that I see very often while dealing with support cases. That's all the things that can happen when you process a message in a distributed system. So you receive a message and then you can modify a database and by modifying the database, you can also request some data that you'll use to build a web request. You are sending a web request over to someone, let's say a post web request.

That third party may decide to modify their database and based on that, generate some response. Send it back here. And here you can say, okay, I want to modify a database again because I got some new information that HTTP requests. And then after that you can formulate, you can create another message and send it down here to another component. The problem with that thing is that there are multiple places where it can fail. You can fail here while saving the data to a database. The third party can fail. You can fail to receive the response. You can fail to modify the database again. There is so many failure points that it makes it very hard or impossible to reason about that code. If any of these failures happen, that message here goes back to the queue and you receive it again. And then your code needs to figure out where this failure happened.

Where did I fail previously? Which of these many interactions that are persistent have been done and how to correctly execute the business code? That's really impossible to do. What we can do is we can strip down that interaction and say, oh, these two interactions with a database, I can do that before even reaching that queue, that interaction with a database I can do after that, I can have another piece of code that takes a message from a queue and modifies that database so I can remove them from the picture. And now I have a much simpler interaction where I get a message that already contains the web request I want to send. So I'm not building it from the data here. I just send it there. The third party can execute data, can modify the data in their database, send me a response, and I might fail here.

So I have only three failure points. But the good thing, well, it's not drastically smaller amount, but the good thing is I can just solve all these failures by retrying the message. So all I need is some sort of deduplication that is happening here so that I can ensure that the third party that is processing my request can deduplicate their interactions with a database. And the database is not corrupted. How do I do it? Of course, I can use the Outbox pattern here, and all I need is to establish some sort of identity. The identity between the message that is received here and request that is going out here. And that interaction of the database. That's the thing that I mentioned previously as we have to force our third parties to provide client-generated message IDs or client-generated IDs for us. We need to provide these IDs for our third parties too.

And we need to bound these IDs that we sent over HTTP with our messages that are coming here. The other way of... Well, that's the Outbox pattern, right? You can ask the question, okay, why do I need Outbox here if I don't send any messages? Well, first of all, you might want to send messages here because well, when you're developing that system, it might send message somewhere here. But the other reason to use the Outbox pattern here is that the side effect here is that there is a response. And for that response, I really want to have it stored in the Outbox here because it's a side effect of my processing. A side effect of my changes to the database is the response I generated, and I don't want to regenerate it every single time. I want it to be stored. So we have in the Outbox pattern here, it works pretty simply.

I just call this thing here, I get the response, I pack it into the message and send it down. Now, can we do it better? Can we make it token-based here so that we remove the Outbox and its consequence of having to store the duplication data forever? Yes, we can. What we can have is a token store here in the middle that is shared between these two. So again, the consequence is one side of that HTTP API, the client of the API and the API itself have to agree on the protocol. That token can be hidden behind the HTTP interface itself. But the protocol, the idea that there is a token store here is a shared idea that needs to be agreed on by this party and this party by client and the server of the API. And now the only downside or the only change that we need to do here is we have to realize that while using the token-based approach, we have to send this message here at the same, more or less, the same time as we're sending the web request.

So by the time we need to send this message, we don't have the response yet. That's the consequence of our restructuring of the pattern. But because we're dealing with messaging, we can always introduce another message queue and another message handler. So the change here is that the response of that API is just delivered later. We are sending a message here and fetching the response a bit later. So we are running out of time. So I need to wrap up by looking at my crystal ball of distributed systems. And what I can look in that crystal ball is basically we are approaching the level of complexity in our distributed systems that requires us to think about the exactly ones delivered processing semantics end-to-end.

Like we had HTTP distributed systems, we had messaging systems for a while. Now it's more and more common to have systems that consist of multiple components that interact through messaging, through event logs like Kafka or Kinesis and through HTTP or Google RPC or gRPC, sorry. And through all these different technologies and the business process may span a message queue ,a topic, a topic in a different messaging solution, a Kafka and something else. And we need to start thinking end-to-end, from a user clicking on a mouse to buy something to a third party actually scheduling a delivery of a Porsche to someone's driveway. We need to ensure that that signal is not duplicated and not lost. That's end-to-end exactly once. And I think that the token-based approach is going to serve us really well in that exercise because it's agnostic of a communication technology. So you can use it in HTTP and in messaging and any other technology. And the other side effect is it doesn't require you to store the data forever.

So with that, have I would like to thank you. We have two minutes for questions left, but I will be around here during the break and during the lunch break. So if you don't get to ask or you would like to talk more privately, just grab me somewhere where we can talk. Thank you.

Exactly-once: the holy grail of distributed systems

About this video

🔗Transcription