Skip to main content

Code the future, now

We know how to write code to do something now. What about code to do something tomorrow, or next year? Sometimes we use batch jobs. But as business grows, our “overnight” batch job starts finishing around lunch time. As we expand to new regions, we realise there is no “night”. And it’s only a matter of time before we make a change and break a batch job we forgot existed. What if tomorrow’s code could live in the same place as today’s code? What if it all looked the same? What if we could scale it all the same way?

Join Adam and discover how to embrace the future in the code we write now. Learn how to capture requirements as first class business logic, and avoid relegating them as second class citizens of the batch job world. Take a deep dive into techniques which combine off the shelf products with fundamental computer science.

Where we’re going… we don’t need batch jobs.

🔗Transcription

00:10 Adam Ralph
Hi, everyone. You hear me right? Yeah, sounds like it. So thank you very much for coming to my talk. Thank you for spending the next hour with me. My name is Adam, and I come from the UK originally. I've lived in Switzerland for the last 20 years. And I work for a company called Particular Software. We're the makers of a product called NServiceBus. Has anyone here heard of NServiceBus? Oh, that's most of you. And that's good to hear.
00:37 Adam Ralph
It's great to be back in Oslo. I always love coming back here to this lovely expensive city. I've been saving up for this trip all year. And when I get home, I'll start saving for next year. And I'm going to talk about this, this is not a talk about NServiceBus. But I am going to be showing some examples in NServiceBus. I'm going to be talking about an approach and a methodology to a certain class of problems. NServiceBus happens to make it really easy to implement some of these methodologies so I'll be using it for examples.
01:06 Adam Ralph
Now I'm going to be talking about a class of problems, which not many people talk about explicitly, and actually don't often explicitly think about when they're writing their code. It has to do with the runtime behavior of that code. Now, you might think that's a strange thing to say, because you do think about the runtime behavior of your code when you're writing it. I hope you do. But there's a certain class of problems which really only manifest themselves when you're in production. And specifically, when we think about the arrow of time, when we think about the dimension of time in our code, historically we haven't had a really good set of tools and approaches to this. And especially when we get into things about around concurrency and time as well.
01:50 Adam Ralph
Now, a little bit of background, I'm going to talk about multi user systems, so you can imagine that in this picture, there's people that are using system, and they're touching the same piece of data at the same time. This is actually the UBS trading floor in Stamford in Connecticut, in the U.S. I actually worked on this trading floor for a couple of weeks. It's not a very good place to write software. It's very, very noisy, but it was a lot of fun.
02:15 Adam Ralph
But let's look at a more sort of familiar domain. Think about online retail. So this is Amazon. I don't think you actually have Amazon in Norway, right? There's no Amazon.no? I don't think there is. We don't have it in Switzerland either. But I still use a German site, sometimes on the UK site, because they slip to Switzerland anyway.
02:34 Adam Ralph
Now, there's a lot of competition in this space. If you're trying to become an online retailer, first of all, you're competing with the giant, with Amazon, and you're competing with a whole load of others. And you can imagine that the business come up with kind of requirements, where it would try and differentiate you a bit. So think about something really simple, like when a customer orders something, I want you to look back through the history of their orders. And if they've spent $100 or more this week, I want you to give them a discount of 10%. So you think, "Okay, right, I'm going to do this, we can do this, I'm going to use DDD, I'm going to use TDD, I'm going to use all the various things I've learned about. I'm going to do this properly, I'm going to tick all the boxes. I'm going to use all the methods that I know about, just like I tried to use all the different animations in PowerPoint for this slide."
03:20 Adam Ralph
And eventually, you might come up with some code, which looks a little bit like this. And you can see that this is basically doing exactly what the customer, what the business user asked for. We're looking back, when this order is submitted, we're looking back through the history, we're matching the customer ID, we're looking at all the orders, which were delivered and which was sent to them in the last week. And if it's greater than 100, we give a discount of 10. If it's less, we give a discount of zero. So what's wrong with this code? Why is this code so bad? You roll it out to production and things are fine. And you've TDD-ed and DDD-ed and you've ticked all the Ds and all the rest of it. The thing is that, first of all, you might get some problems with database load, right? So we're doing a lookup across the history of orders here and we're going to sum things up. And even if you get all that sorted, so you might have good DBAs and you put indexes on your tables and you optimize it and do some other work. You get it all working fast enough.
04:15 Adam Ralph
There's another fundamental problem here. What does it mean to be a customer? We've got this thing up here called customer ID. Does that mean a physical person? So is a customer ID a physical person at a machine sending orders into your website? Or is it something more than that? Could it be something more than that? Think about a corporate account. So some companies do that. They say, "Well, if you're going to order stuff from this website, please use a corporate account because we've negotiated this deal where we get discounts."
04:45 Adam Ralph
So a single customer ID could be two physical people at two computers, and they could be clicking this button more or less at the same time. So what you've got then effectively is two threads. You've got two threads of execution processing orders. Let's assume for a moment that our order total is 90, right? So it's below our threshold of 100 and each of these towards is two orders is for 10. Now one of these ought to get discounted, right? The first one that gets processed should not get discounted, that should take the total to 100. The second one should get discounted. The problem is they're executing in parallel, and they both look at the database, and they both see a sum of 90. So what are they going to do? They're both going to send out the order with zero discount, because they're both saying it's not above 100.
05:35 Adam Ralph
Now, that's problematic, because the person that comes down to paying the bill at the end of the month is going to have a look and say, "Hang on a minute. We sent in 11 orders that week, and they all got charged the full price. You told us, you're going to give us a discount. One of those should have been discounted at 10%. That's actually a breach of contract." So you say, "Oh, well we're terribly sorry, we'll give you a discount or a free refund or something."
06:01 Adam Ralph
And then you think, "Well, we must have a bug in the system, there must be something wrong. Then you look at it, and you realize that well, they pretty much pressed the button at the same time or within a couple of seconds of each other. So you go back to the customer, you say that, "I'm terribly sorry. But you ordered too quickly." The customer said, "What do you mean ordered too quickly?" "Well, if you just waited a few seconds before the second order, we would have given you a discount." I'll kind of look at you funny and say, "What? And how do you expect me to do that? We have 10 people using this website. Now how do I coordinate that?" And then you say, "Why don't you shout around the office or something?" And then at that point they'll say, "Thank you for business. I'm going to Amazon instead."
06:37 Adam Ralph
Have you ever heard the term death by Amazon? That's what Amazon does. If you don't run your business efficiently enough, Amazon will eat you. So that's not good enough. Now you think, "Well, we've got to solve this problem on our site somehow." And you think, "Well, what about transactions? Why don't we just wrap each of these things in the transaction? That will sort out, right?" The problem is that most major database platforms, a SQL Server, Oracle, et cetera, use an isolation level for transactions by default, which is called at least in SQL Server read committed. So what that means is, the transaction isolation level will allow both of these threads to read as 90. If I try and write to the same row or something afterwards, it will sense, it will actually not allow one of the transactions to commit, but it will allow them to read this figure of 90.
07:30 Adam Ralph
So that's not really going to fix it out of the box either. But then you read about serializable transactions in SQL Server. And what that means is that each transaction takes out read locks on all the stuff it's read. And when I try and write back, then they'll actually not allow the transaction to commit if the other thing hasn't released its read lock. And that will solve the problem. The problem is you switch that on, and all of a sudden, your error logs just fill up, so you quickly panic and go on and switch that thing off again. But the thing is, what you got to realize is that those errors are feedback. Those error logs are telling you you've got concurrency problems in your system. They're telling you that people are clicking the button at the same time. So you're kind of scratching your head, you're kind of running out of ideas, but then you think, "Well, I've solved this before. I've sold this another way before. We have concurrency problems in server system. And to do that I wrote a batch job."
08:23 Adam Ralph
Now, you might end up with code, which does something like this in some kind of batch job instead. So instead of doing this on the fly when people are submitting orders, we're going to look through all the customers one by one, we're going to look back through the history, and we're going to set their discount level to 10 or zero, according to how much stuff they've ordered.
08:45 Adam Ralph
Now, there is a kind of behavioral problem with that as well. But before I go into that, think about the technical side of this as well. Now, this is in C#, or whatever your favorite programming language is, whatever you've written your message, your system in. Now, that's a good situation. If you're lucky, you get it to write in C#. And if you're really lucky, if you're really disciplined, you might even put it inside this nice encapsulation you've got here a nice DDD and micro service and SLA developed solution, which nicely encapsulates the database. But what often happens is this stuff tends to live outside, it tends to kind of get bolted on the side and goes directly to the database.
09:28 Adam Ralph
Now, if you are lucky enough to be writing this in C# or your favorite language, whatever you do, don't let your DBA see it. Because when your DBA sees it, they'll say, "That should be written in SQL." And to be fair, they might be right and they're good at this. DBAs are good at this. They know how to make SQL Server perform well and how to tune it and optimize it and put right indexes and tables. And indeed, it turns out it is running faster. So they give you the store procedure. And very often, what I've seen before is they write the initial version of the store procedure and then kind of give it to you for maintenance.
10:06 Adam Ralph
And because this thing is working well and efficiently, sooner or later the business come along and they say, "Well, you know that batch jobs thing that we did a few, a couple of weeks ago to do that discounting thing? Yeah, we need another one of those to this other thing." And then another one appears. And then another one appears and another one appears. Now, what starts, and this is just random SQL, by the way that I just copied and pasted from somewhere. I don't even know what it is.
10:32 Adam Ralph
But the point is, it kind of all looks the same anyway, right? But the problem here is that the interesting stuff is actually all going on on the outside here. This stuff in the middle is kind of shrinking in importance. All the new funky business requirements, all the differentiators, all those things that are giving you a competitive advantage is starting to be coded on the outside here. This is becoming the system. Is anyone here familiar with a strangler pattern? Anyone heard that? So a strangler pattern, there's a few of you. The strangler pattern very briefly is when you have a nasty piece of code, you gradually carve out bits of that code and replace it with nice pieces of code until you strangle the bad code out of existence. It's almost some kind of like a reverse strangler pattern. The nice code is being strangled out of existence. And this stuff kind of starts to take over your life.
11:25 Adam Ralph
And I've been in a situation before, I've laid in bed awake at night thinking about how to debug stored procs. All of a sudden, you realize you've become like a stored proc specialist or something. And it's not a nice situation to be in. Because ultimately, you've got this kind of big ball of mud written in store procedures. And I've been in this situation, and it's really quite unpleasant. But even taking all that aside, even if you manage to avoid that situation, even if you manage to keep it in C#, you manage to keep it in your domain model and if you keep it nice, there's another fundamental problem here.
12:02 Adam Ralph
Previously, we were basically penalizing users for clicking the button at the same time or within a few seconds of each other. We are now penalizing them for clicking on the same day, because what you're effectively saying is, if you're going to order a load of stuff, don't order more than $100 worth on Monday. Order $100 worth on Monday. Wait until Tuesday so the overnight batch can run and then send in the rest of the orders. Customer's going to say, "Right, bye bye, I'm off to Amazon." It's just not going to work. That might have been good enough in the early days of e-commerce, I don't know. But these days people expect more.
12:38 Adam Ralph
So we can't just take this two or three second race condition and expand it to a 24 hour race condition. This is one of the reasons that businesses are trying to get away from things like batch jobs. So we need another way to think about this arrow of time. We need another approach, another model to think about how we're going to program the arrow of time. So here it is, here's our arrow of time. And if business is going smoothly, we're going to get orders flowing through the website. Now the requirement we've been given again, is when an order comes in, I want you to look back through the history of the orders, sum them up. And if the week total is more than 100, give a discount.
13:21 Adam Ralph
Now, that is not the requirement. One of the things I've come to realize over the past few years, users and business people don't often give you requirements. What they tend to do is give you workarounds within the constraints of the current system. They know that we're recording orders. They know we've got this order history of customer IDs and totals and things. And they know that if we go back through that history, and sum all those orders up, we can work out whether it's a discount or not. So they've given you that workaround.
13:56 Adam Ralph
The actual requirement is when a customer has spent over $100 in a week, give them a discount. Now that might seem like just a subtle change in wording, but the problem is because we're getting away from that go back through history and look up. We're already not tied to that as a solution. We can start to think about other ways of doing things. We don't necessarily have to look back in time. We can actually kind of look forward in time. Now I'm not saying we can predict the future. But there are methods where we can kind of fold the history and the future into our current code.
14:31 Adam Ralph
So let's have another look at the arrow of time. Now, when the first order comes in, what's the customer's current week total? It's zero. Exactly, right? So what are we going to give when the first order comes in? We're going to give no discount. That's simple. Now we've introduced this concept of a week total or weekly total. So we can say, "Well, the weekly total right now is zero. But importantly, we add the current order into the weekly total.", because this is now going to count in our weekly total. But even more importantly, in one week from now, we need to remove that order total from our week total.
15:17 Adam Ralph
When the second order comes in, we do the same thing. Again, we add the order total to the week total. And importantly, in one week from that moment, we need to remove that order from the week total. And you can see, as we do this for every order that comes in, we're going to have this kind of sliding window of the week total and what orders are going to be added to it and one point they're going to be taken away exactly a week later. So you can see that's going to kind of slide through time and give us that weekly total at any one point in time.
15:47 Adam Ralph
And we do it for every other order that comes into the website. And when a specific order comes in, what do we do? Do we look back through the history? We don't have to do that anymore. We just look at the week total. So what we're doing here is we're basically updating a single number. So it's consistent in the same way that the batch job was consistent, where we're updating a single discount level for a customer. We're now updating a single row in the database, this whole this week total. And databases are really good at guaranteeing transactionality around a single round of database. I'll come to that a bit more in a minute.
16:24 Adam Ralph
But what we've done here is instead of fudging the number by forming calculations over the existing domain objects, we've introduced this concept into our domain itself. I've got this week total, which has become part of our ubiquitous language. And it's a kind of inverted kind of inverse anti-batch job if you like. But the thing is, we need a good way to program this. So how we're going to program this sentiment of time, how we're going to program this taking away the order in a week's time, how we're going to make sure that happens.
16:53 Adam Ralph
You can't really do it with HTTP, I don't know any way of sending an HTTP request and having it arrive at the consumer a week later. HTTP doesn't really cater for that. You can kind of fudge it with system.threading.timer or something like that. But the problem is, that's fine until the process crashes. Okay, the process starts up again, it's forgotten, because that's all in memory.
17:14 Adam Ralph
So we need another approach. Now what I'm going to do now is introduce the notion of a saga. Has anyone heard of the saga pattern at all? That's probably at least half of you. Okay, so very briefly, saga, it was a term coined in the late 80s, I think it was 1987. It was effectively about rolling back transactions and breaking transactions into smaller transactions. So take a big transaction, you have a series of smaller transactions. If one succeeds, and another succeeds and another succeeds, another fails, you kind of roll back the previous steps. And that's kind of what the original saga pattern described.
17:50 Adam Ralph
We've abstracted that a little more generally, because the thing is that saga, the real world doesn't really roll itself back. It doesn't really undo itself. And our saga abstraction, effectively all it abstracts is a message coming in, another message coming in later. And then what you do when that second message arrives depends on what happened in the first message. And in order to do that, when the first message arrives, we saw state. When the second message arrives, we retrieve that state and we make decisions based on the second message and the state from the first message. That's basically what a saga is.
18:29 Adam Ralph
And this kind of undoing, I often see this example. So when people talk about sagas, they're talking about booking your holiday. So you need to book a flight, you need to book a hotel room, you need to book a hired car. I've tried to use sort of typical images from a Norwegian family holiday here, I hope I've got this right. And when you book your flight, that succeeds, and you manage to book your hotel, then you try and book your hire car. And they're all sold out. So what do you do? Holiday's off, we're not going. We're going to roll the whole thing back. We cancel the hotel and cancel the flight.
19:00 Adam Ralph
It's a really bad example. So if you see people using this example when they talk about sagas, call them out on it, because it's a really bad example. The real word is richer than that. So when we fail to rent the hire car, we go to another company. If they're all sold out, we look at other forms of transport, stretch limo or something. So we need a more general pattern, which is not just tied to this kind of rolling back, which is a bit unrealistic. So that's a lot of talking. Let me show you a little bit of code.
19:31 Adam Ralph
Now I'm just going to show you generally what a saga looks like. I'm not going to do anything exciting here. I just want to kind of show you the API just before we go back to our use case. So what we've got here is we've got a flight booked class, which we're going to use as a message. That just has a holiday ID on it. We've got a hotel booked message with a holiday ID and a car booked message with a holiday ID.
19:59 Adam Ralph
Now our requirement is for this saga, when the flight's been booked, when the hotel's been booked, when the car's been booked, I want to publish a holiday booked message. And then something else in the system can go and do something else. Now, these only contain IDs, because I'm assuming the other parts of the system know about the details, right? So all we actually need for this use case is just the idea of the holiday. So we publish this holiday booked message, and then something else in the system can then say, "Okay, I'm going to email the customer, tell them their holiday's being booked, or I'm going to bill them for the holiday or do whatever."
20:37 Adam Ralph
So a saga is as simple as a class. That's all a saga is. It's just a class. So I've called it holiday booking. And the first thing we need to do is we needed to find some kind of class to hold the state in between these different messages, in between receiving these different messages. So I'm going to use a nested class for that. And I know some people hate nested classes. But I think they're quite nice in this specific use case, because I demonstrate that this data is for this saga and nothing else.
21:07 Adam Ralph
Now I'm inheriting from a base class called Contains Saga Data. That just has some infrastructure stuff on it, like a kind of internal ID for the saga, who sent the original message, what was the original message. We don't care about that in this use case. That's useful for other things, but just to show you what that class is. So in our class, we're defining the holiday ID because ultimately, that's the thing we have to raise the event for at the end. We're recording the fact that the flight is booked, that the hotel is booked and that the car is booked.
21:40 Adam Ralph
Now what we can do is we can inherit from a saga base class, and we tell it that the state is going to be stored in this nested data class that we just defined. Now we need to handle the three messages, so we need to handle flight booked, hotel booked and car booked. And what that does, it gives us a bunch of methods to implement called handle. And in each of those, we get the message itself and we get a context object, which is just a thing that we can use to send other messages within the method. So when we receive flight booked, we say is flight booked true, hotel booked is true, car booked is true.
22:25 Adam Ralph
And in each case, we call this check booked method. And that's down here. And all that's doing is it's saying when the flight's booked, when the hotel's booked and the car's booked, we publish using this context object that's been passed in, a holiday booked event with the holiday ID. Now at that point, we're done. Saga has served its purpose, because that's all its purpose is, is when the three things have been booked, raise an event, say that's been done. So the rest of the system can do something.
22:57 Adam Ralph
So what we can do now is we can mark as complete, and that's a method on the base class, which just says this saga is done. It doesn't need to exist anymore. There's just one thing left to do, and that is to configure how to find the saga in the database. So all this does, it says for each of these message types, so flight, hotel and car, take the holiday ID from the message and map it to the holiday ID as a saga. That effectively tells the infrastructure how to form a select statement in the database. So when a message comes in, try and select a systems saga. If it doesn't exist, create one.
23:31 Adam Ralph
Now I've gone through that pretty quickly. But I'll show you sort of diagrammatic explanation of what's actually happening there. So we have kind of two important abstractions here. We have the bus and the persister. The bus is kind of an abstraction over a message queue. So that might be Azure Service Bus or RabbitMQ or MSMQ or any other queueing system. The persister is essentially an abstraction over a database. And that can be SQL Server or Oracle or RavenDB or MongoDB. It doesn't really matter. Now, when we receive the first message, the bus tries to retrieve the saga from the database. And because it's the first message with that holiday ID, it doesn't find it. So that goes all the way back to the bus. The bus knows that it has to create a new saga object. It dispatches method into it, the message into it, which is basically calling the handle method.
24:24 Adam Ralph
That handle method will alter the internal state. It may or may not send messages out. And at that point, we persist the saga, excuse me in the database, which is basically an insert statement, if you're using SQL Server or something like that. When the second message comes in, we do the same thing again, we try and retrieve the saga from the database. This time it does find it because we saved it after the first message. It then creates a new saga object, fills it with the data from last time and gives it back to the bus. Then we do the same thing again. We dispatch a new message into it, that then alters the internal state somehow. It might send more messages out and we persist the saga again, which is basic then an update statement.
25:05 Adam Ralph
Or we might decide we're done. In those cases, we mark ourselves as complete. And the persister basically deletes it from the database. The Saga is basically done. So I hope most of that makes sense. So now that we know kind of more or less how a saga works, if we go back to our use case, how can we use the saga for our business use case? How can we actually achieve this? And importantly, how can we do this magic do something in a week from now?
25:33 Adam Ralph
So again, I think the best thing is for me to show you the code. So what I've got here is a couple of other messages. So we have a Submit Order message with a customer ID and a order total, so this is just how much the order is for, and it will have some other stuff, which I've just left out for brevity. Then we have a process order message with a discount level, and whatever other properties it needs for something else to process the order.
26:03 Adam Ralph
So the requirement here is when we receive a Submit Order message, work out the discount and send the message to process the order with that discount. That's the requirement for the saga. That's its single responsibility. So again, we've got a discounting class. I've called it discounting this time. This is going to be our saga. For the saga data, I'm going to store the customer ID because we do discounting for the customer level for each customer. And I'm storing the week total. I'm using a double here, you probably won't do that in real life. It would just be kind of money type or something with appropriate accuracy, but just using a double for simplicity.
26:46 Adam Ralph
Now what we can do is we can say that our saga is going to implement, is going to inherit from the base class. Discounting data is going to be our state storage. And we can handle the Submit Order message. So we can say I am started by messages type Submit Order. That gives us again, a handle method to implement. And then here, we're doing what we're asked to do. So we're saying if the week total is greater or equal to 100, the discount level is 10. Otherwise, it's zero. We then send our process order message with that discount level. And here, we need to add the current order to the week total for when the next message comes in, when the next order's submitted.
27:33 Adam Ralph
And here's a really important bit, here's the bit of magic. We're going to request a timeout seven days from now with the same message. This is again, this request timeout is a method on the saga base class. And what we do then is we implement this interface called I handle timeouts of Submit Order. So not only am I now started by Submit Order, I handled timeouts of Submit Order. And we implement a timeout method.
28:07 Adam Ralph
And then that timeout method, we know that when this method is called, seven days have passed, so all we do here is we take away all the total from the week total. And effectively, we're done. There's just one last thing to do, we need to configure how to find the saga. So we need to say when a Submit Order message comes in, map the customer ID of the message, the customer ID of the saga, so we can correlate. And then we're basically done.
28:35 Adam Ralph
Now this code is pretty easy to read. All we're doing here is saying, "Well, what's the discount level? Send the process order message, add it the total. A week later, take it away from the total." It's exactly what I showed you in the diagram. So how does this actually work? What does the infrastructure actually do when this happens? So again, when we get the first message in, we retrieve a saga, I've left that bit out because you've already seen it. But once we've got the saga object, we dispatch the method into the message into it by calling the handle method. The saga does some internal statement manipulation and then it requests the timeout. That's that method you've just seen.
29:15 Adam Ralph
At this point, the bus does one of two things. Either it puts the message directly in the message queue and says, "Deliver me this message in a week." And some queueing system support that. So I think the only one does it fully out of the box is Azure Service Bus, which despite its name is just a queueing system. But it does support that out of the box. RabbitMQ and Amazon SQS kind of do it. I'll come to that in a minute. And other queueing systems don't do it at all. So MSMQ has no facility for this. So in those cases, we actually persist the timeout in the database at the same time as we persist the saga data. And then we pull the database to see when the timeout has become due.
29:54 Adam Ralph
That's not ideal. We don't really want to be pulling the database if we can avoid it. So we always prefer allowing the queue itself to do the work over here. Now I'm going to take a quick dive under the covers and sort of peek behind the curtain and show you what NServiceBus does behind the scenes a little bit here, because it's quite interesting in the context of Amazon SQS. I've said that it kind of supports this. And it does, because it only allows you to delay a message by 900 seconds. So 15 minutes is the maximum. So if you want to delay a message by 600 seconds, that's fine. You just tell Amazon SQS to say, to deliver this message in 600 seconds. If you want to do something like 2,800 seconds, it's a bit more complicated. We can only put this message in the queue for 900 seconds. That's the limit. But what we can also do is we can write a custom header on the message, like a message for ourselves to say, when we receive this back, there's still a delay of 1,900 remaining.
30:53 Adam Ralph
So then we know we can put it back in the queue for another 900. And when we get that back, we know there's 1,000 remaining by looking at that header. Put that in the queue for another 900 seconds, I mean there's only 100 remaining, and we just put it in this queue for 100 seconds. So using that by flowing the message through these queues four times, we can actually simulate a delay of 2,800 seconds. I'm going to do that arbitrarily for any kind of delay. That works pretty well. RabbitMQ is a bit more interesting. RabbitMQ has this thing called Time To Live. And what that means is for a specific queue in RabbitMQ, you can say messages live in this queue for this amount of time. So what we do is we say we send a message into that queue, let's say 600 seconds, we set the 600 second time to live, we put a message in that queue. We don't subscribe to that queue at all. But what we do is we tell rabbits that when a message falls out of this queue, when it's exceeded its time to live, put it in this other queue, right, and we subscribe to that queue instead.
31:53 Adam Ralph
So using that, we can actually simulate delayed delivery in RabbitMQ. And that works pretty well, because then you can create a queue with 900 second Time To Live, one with 1,500 seconds, one with 2,800 seconds. But you might be starting to spot the problem here. You're the to have a separate queue for every delay level you want in your system. Now, the API looks like this, and I can write any number I like in there. If I want to be really evil, I can write random.next. And I don't think any, there might be some business where there might be use case for that. I might want to randomize the delays for some reason. But the problem is that the number of queues could kind of grow and kind of grow infinitely. And we don't want customers asking us why NServiceBus is carrying 50,000 queues in that RabbitMQ broker. So we need another approach.
32:43 Adam Ralph
Now, let's take a more simple example. 42 seconds, right? We all have that number. But it's only written with the digits four and two in base 10. So this is 42 in base 10 or decimal. But as clever computer programmers, we know there are other counting systems like binary. And we can write this as 101,010 seconds in base two or binary. Now, what if we create a queue with a 32 second time to live, one with 16 second time to live, one with eight, one with four, two and one? We can take our number 42 and write it in binary alongside these queues. We can send this message through the 32 second queue, skip the 16, through the eight, skip the four, through the two, skip the one. And by sending the message through the 32 plus the eight plus the two, which is 42, we've simulated a 42 second delay in RabbitMQ.
33:42 Adam Ralph
And I haven't seen anyone else do that in the community, I think it's quite kind of original invention. So we can simulate a delay of up to 63 seconds with these six queues, which isn't a lot. So we actually create 28 queues, which means you can have a maximum delay of two to the 28 seconds or this big number here or about eight and a half years. Now you might think, "Well, why eight and a half years? Why 28 queues? It's a bit arbitrary." Well, we considered 25, which will give you just over a year. But then we didn't like that because it was odd so we thought 26. And then we thought, "Well, 26, we think that's safe, but let's just go one order of magnitude higher just to be safe." So 27 was a candidate. But of course that was odd. So it ended up being 28. We're really tempted to go for 32. But that would have given us a maximum delay of 136 years. And yeah, it's pretty nice to think that someone's using NServiceBus 136 years from now, but it's probably a bit over the top, so 28 years. If someone really needs more than eight and a half years, will add a 29th or whatever. I would probably add a 30th at the same time.
34:55 Adam Ralph
So have we solved our problem here? We've got this nice saga now it's only touching one row. But the thing is the users don't care. They don't care we're using a saga. They're still going to start pressing the button at the same time so we're still going to get two of these saga things executing concurrently, maybe on different processes, or maybe you're using two threads in your process. And they're still going to see the saga state of 90. They're both going to see 90 in this weekly total. So we haven't actually solved the problem at all.
35:25 Adam Ralph
But the thing is, remember that we're writing the saga state to a single row in the database now. Wow we can actually take advantage of regular optimistic concurrency. So we've seen how these things get invoked. We've seen that this same process is going to happen in two different threads. Now, what's going to happen here is that the sagas are going to see 90 in the database. They're going to set the discount of zero. They're going to send the process or the message with a discount of zero, which is wrong. Both of them are going to do that.
35:56 Adam Ralph
But because they're trying to write the saga state to the same row in the database, one of them is going to succeed. And importantly, its messages will be sent because NServiceBus guarantees that the messages are only sent from the handle method if the saga state is persisted. So that line of code, which says context.publish or context.send, we don't actually send the message at that point. We send the message as we save the saga state. So those messages get sent.
36:29 Adam Ralph
This transaction gets refused, because it's trying to write to the same row. So these messages are not sent. At that point, that message gets rolled back to the queue, it gets invoked into the saga instance again. Now the saga is going to read the updated state from that other thread. It will now be 100. So it will give a discount of 10%. And importantly, it will be able to save its saga state, the messages will go out and we have achieved our objective.
36:59 Adam Ralph
Okay, so we're getting on the way to killing the batch job. And I'll tell you what, at Particular Software, this is one of our missions. We want to kill the batch job. We want to rid the world of batch jobs. We even published a blog post a couple of years ago, we'll put death to the batch job, which is on our website. And one of the reasons for that is because when you start to use this alternate programming model, this becomes part of your domain. This saga lives in the rest of your code. It lives, it uses your domain objects, it's all nicely DDD-ed and TDD-ed. And it's not this kind of separate thing on the outside.
37:37 Adam Ralph
Now, the problem is that we speak to people about this. And no matter how hard we try, we always get to the point where the customers still talk about, they still say, "Well, okay, that's all really well and good. But in this specific case, I still really need batch jobs. I still really need them. That something else, this isn't going to work for me."
37:57 Adam Ralph
One of the really common things is reports. So anyone here worked with a system which produces reports, it's quite a common thing, and that's about half of you. So I worked with a system in my previous job, and the overnight batch jobs started to finish around lunchtime. We actually became victims of our own success. That business was going well. And this thing, because we're getting more and more things in database, more customers ordering things, the batch job was growing and growing and growing to the point where I mean, I don't know where it is now, and I left a couple of years ago, but it was threatening to start finishing towards the end of the working day, which really isn't very practical at all. Getting close to one, you actually have to kick off the next one that night.
38:41 Adam Ralph
And then of course, if you open an office in Singapore or in New York or in San Francisco, you actually don't have any night anymore. So the whole concept of an overnight batch job kind of falls away. But the thing about reports is that if you look at what the users are doing with reports, if you start to ask why, and this is a really important thing, you need to start asking why. There's this thing called five whys where you kind of go through this process of asking why kind of around five times until you get to the actual requirement. And they tell you that they run this report at some time in the morning every day. And let's say this is some kind of banking domain or something that looks at a report and I say, "Well, I look for large transactions in high risk countries. And if there's three of those within a week, then I kick off some kind of process. I can talk to my manager, or I call the customer or I temporarily lock the account or do something." Now you're already starting to learn things by asking this question as you've learned that there is a concept of a large transaction and there's a concept of a high risk country.
39:48 Adam Ralph
And that might say our largest transaction's over $10,000 or high risk country's any one of these six, something like that. So again, your ubiquitous language in DDD terms is starting to get enriched. Now the thing is, again, the report was never a requirement. Status is a requirement but is actually a workaround within the constraints of the current system. They knew that these transactions are being held. They knew that if they put them into a report each day, they could look through them, find the high risk transactions, the possible fraudulent transactions, and take some action.
40:26 Adam Ralph
The actual requirement was when there are three possible fraudulent transactions within a week, I want to know about it. So why not use something like a saga and do some kind of automated pattern matching? So we can subscribe to events, we can say, let's say there's a transaction occurred event or something like that. We can look at those events, we can examine them, we can say, "Is this a large transaction? Is it over $10,000? Is it a high risk country?" Yes, it's in one of these six. If so, we add that transaction to our saga state to say possible fraudulent transaction. You put it into a collection in that saga state. Then we send the message to ourselves one week from now to remove that transaction from our possible fraudulent transactions. And again, what we've built for ourselves is the same sliding window. We've got a sliding window of one week, which is recording all the possible fraudulent transactions within that week.
41:30 Adam Ralph
And we know that any time when that collection count hits three, we need to inform the user. So we might send them an email or something, and they might then go and tell their manager. And the thing is that this might happen at say, half past four in the afternoon. And you usually check that report at 9:30 in the morning. So the manager says to you, "Well, hang on, it's half past four in the afternoon. You normally check that report at 9:30. Why is it taking you seven hours to tell me? You should be telling me this straight away. This is urgent, we need to do something."
42:02 Adam Ralph
Then you say, "Oh no, no system told me." "What do you mean the system told you?" "Oh, yeah. You know that saga thing that developers were talking about a couple of weeks ago? Yeah, they implemented that. They rolled it all out. And then the system just tells me." You say, "Well, you mean you don't look at that report anymore?" You say, "I haven't looked at the report for two weeks. The system now tells us and we know exactly when it happens. We can take action straight away. We don't have to wait to the next day."
42:28 Adam Ralph
And this is kind of mind blowing when managers hear this kind of thing. This is like real time business intelligence. This is where they say, "Well, I want more of that stuff. Yeah, that saga stuff, I want more of that. How do I get more of that? Where do I sign?" Because ultimately, this translates to actual business value. This is actual money. These are Norwegian kroner, right? Is that the right money? I actually don't really see this very often nowadays, it's kind of all plastic. But this does translate into actual business value. So this is actual money. This is when a software development team can start to transition from being a cost center to being a profit center.
43:12 Adam Ralph
So before I wrap up, I've shown you a little bit about NServiceBus here. If you want to know a bit more about that, we've got this game going on. It's this kind of space exploration game where it kind of takes you through the API and shows you that so you can follow that URL, and see that there are details about that down in our booth on the exhibition hall as well. And I hope that this has given you a kind of different perspective on this arrow of time. There's a couple of key takeaways I want you to take with you. So the first thing is that you want to be using a programming model, which embraces the aspect of time. You don't want to be fudging this by doing historical lookups or batch jobs and that kind of thing. You want a programming model which embraces the aspects of time and actually allows you to put that into your code.
44:00 Adam Ralph
Using something like durable messaging allows you to do that in a reliable and to guarantee that's going to happen rather than trying to fudge that with some kind of database polling of your own because then you end up just building your own queue anyway. So you want some kind of programming model, which actually embraces this aspect, and you can then fold it into your DDD aggregate route. So sagas, has anyone read Eric Evans' DDD book? There's a few of you. So in there, he talks about aggregate routes. Sagas are actually very good at aggregate routes.
44:35 Adam Ralph
So if you think about it, what they do is they encapsulate private data. And moreover, they're actually just very good objects. So if you've ever read anything by Alan Kay, who wrote about object orientation a long time ago, he never really talked about polymorphism, inheritance and that kind of thing. He talked about objects receiving messages, encapsulating data and business logic and sending other messages out. It sounds very much like a saga.
45:02 Adam Ralph
So I often say that sagas are really just object orientation done right, with a difference that they are persisted. They survived process restarts. So it's like a persistent state object that survived process restarts. And it wraps up this kind of messages in and messages out thing for you. They're very, very good test targets. So any fans of TDD here? So half, I bet the people who didn't put their hands up absolutely hate it. It's a kind of love hate thing with TDD. I think part of the problem there is that we've never had really good units to TDD. So because saga completely encapsulates its data, and it's just messages coming in and messages going out, it's a very, very good TDD target. You can kind of get around the whiteboard with your domain experts, and say, "Well, when this happens, and this happens, and this happens, when we get these messages in, we want these messages out."
45:58 Adam Ralph
And we have a testing package for NServiceBus, which allows you to put a testable context into those methods, that context object, and you can call a few handle methods, and then look at the context object to see which messages were sent out. So they're very, very good TDD targets. And I do recommend that you do that. I do recommend that you test sagas, especially if you're looking at the aspect of time, because you don't want to be kind of sending something into the system, waiting a week and then waiting to see if it happens correctly. You can actually just call a request timeout method and do that in your code.
46:36 Adam Ralph
The other thing that I really want you to take away again, is that thing about workarounds versus requirements. And this involves talking to your business, which is often a tough thing to do. But I actually do think it's one of the biggest challenges that we have as developers is to actually get to the real requirements, because I think more often than not in my career, I've been given workarounds within the constraints of the current system. And it's part of that asking why and actually getting back to the actual requirement itself. That's a bit of an art form. And ultimately, it can be an organizational problem. Sometimes it can be difficult just to talk to the people that have that information. Sometimes you might have a business analyst sitting between you and the actual business users or kind of proxy domain expert kind of person. So actually getting in the same room with the actual people that can give you the real climates is sometimes a challenge as well.
47:28 Adam Ralph
And that's just an example of kind of an organizational problem, which is ultimately a lot of problem that we face as developers. Now we don't face as many technical problems as we think. It's actually a lot of organizational problem where things come from.
47:41 Adam Ralph
So I hope that's given you a good sort of different way, a different method, a different methodology of thinking about this arrow of time and a different kind of toolkit, and a different thing in your toolkit to actually think about how to actually get that into your systems and how to model it. So I've actually raced through that content. I think I've got over 10 minutes left. So if anyone's got any questions right now, I'll be more than happy to take them. Yeah.
48:08 Speaker 2
Do you have experience with overusing sagas?
48:12 Adam Ralph
Do I have experience with overusing sagas? The answer to that is an unqualified yes, because what I've seen sometimes is once you learn about sagas, you start seeing sagas everywhere. And so like anything, they're not that golden hammer, they're not that silver bullet, which will solve all your problems. They're to be chosen to be used. And yes, I've seen people overuse them. Definitely. Yeah. I mean, one of the things is that one of the specific examples is that people sometimes think that a saga is an orchestrator for a much wider business process. So they say, "Well, we've got a service over here and a service over here and a service over here." Let's say it's flights and hotels and booking and the saga needs to orchestrate all of that.
48:56 Adam Ralph
The problem is that when you, so I give another talk called finding your service boundaries, and that's also online. I gave it here last year. In there, I talk about how to actually find those service boundaries and how you must not cross service boundaries within that saga or anything else. So it's important to say that a saga is within a service boundary or a bounded context, if you like in terms of DDD terms. Use a saga within a bounded context. If something else needs to happen in some other bounded context, raise an event from that saga, start a new saga in the new bounded context. So don't have this kind of God saga encompassing everything. Still respect your bounded context and your service boundaries by splitting a larger workflow up into smaller sagas. Does that more or less answer the question. Any more questions? Yeah.
49:50 Speaker 3
[inaudible]
50:02 Adam Ralph
Sorry, it's a bit difficult to hear you but you're saying when you have a service layer?
50:09 Speaker 3
[inaudible]
50:09 Adam Ralph
Yeah, right.
50:19 Speaker 3
[inaudible]
50:21 Adam Ralph
Right. So when you have a service layer, we've got this real, this delayed messages thing. And I'm trying to use it for part of the business logic. And you're saying it's dangerous.
50:38 Speaker 3
[inaudible]
50:38 Adam Ralph
Okay, so is it dangerous is your question? Is it dangerous? Okay, so you're saying I'm scattering the logic. Well, yes I am. I'm spreading the logic over time. But what I'm doing there is I'm actually embracing the business requirement. So that is actually the business requirement is that we want to keep a window of the current week total for a customer, and we want to take action based on that. So what I'm actually doing is I'm actually embracing that business logic in my code. And I can see, I can read that code and say, "Well, I see what happens now. I see what happens in a week's time." So that actually expresses that requirement. So I can see, maybe you're nervous about the fact that this stuff is in timeouts. And it's kind of hidden from view. So what you can do then, is you can kind of use a CQRS as your kind of approach to say, well, you can raise an event and write down to a read model. So that any one time you can say, "Well, I've got these things which are pending. I've got these uncompleted sagas effectively."
51:49 Adam Ralph
And actually, that is one of the other pitfalls of using sagas. We often have a lot of customers who ask, "How do I see what's inside the saga? How do I see inside the saga data?" And the problem with that is you don't want to be doing that, because then you're punching through the abstraction. And all of a sudden, you can't actually change your saga in isolation, because you've got all these other things looking into the database. So that's why it's really good to raise events and actually capture that as a first class read model to say this is the state of this process right now. And I'm going to write down to an explicit read model. Does that allay some of your fears around the-
52:28 Speaker 3
[inaudible]
52:28 Adam Ralph
Kind of, yeah. So Azure Storage queues is another transport, is another queue that we do support. That does not have any kind of delayed message mechanism built in. So we do use a database method. For Azure Service Bus, Azure Service Bus has a different queueing technology that has the native delays built in.
53:00 Speaker 4
[inaudible]
53:03 Adam Ralph
I'm finding a little bit difficult to hear you now. So come and grab me afterwards. And we'll continue the conversation. Okay, any other questions before we wrap up? Yeah.
53:14 Speaker 5
[inaudible]
53:35 Adam Ralph
Okay, so your question is, you do this transactional thing at the beginning, you do a similar transactional thing at the end when the message becomes due in a week's time. And do you have the same transactional guarantees around it?
53:53 Adam Ralph
So the answer is yes, because when you receive that timeout in a week's time, it's exactly like a regular message handling. So you handle the message. The saga's internal state will get mutated. Somehow, you'll persist that state and actually, you'll persist it regardless, even if it's not mutated. And again, if you're sending messages out from that timeout method, if you can't persist the state, because of another concurrent operation, that transaction will get rolled back and the message won't go out. So handling the timeout has exactly the same transactional guarantees around it as handling the initial message.
54:34 Adam Ralph
Okay, so I'll let you go a few minutes early. I hope you've enjoyed that. I hope you enjoy the rest of the conference. I am going to be at the Particular Software booth for most of the rest of today and probably tomorrow. So if you've got more questions, please do come and grab me. I'm more than interested to answer them for you. I hope you enjoy the rest of the conference. I've enjoyed talking to you and thank you very much.