Skip to main content

Webinar recording

Got the time?

“Have all my overdue invoices been paid?” Seems a simple enough question. But once you factor in the effects of time, even the simplest question can turn into a mess of edge cases and complicated batch jobs that never quite complete on time.

🔗Why attend?

Real business systems tend to be messy, and the effects of time make them even messier. A command like “check if overdue invoices are paid” has to deal with questions like “are all invoices due on the same day?” And “are invoices due in 30 days or in one month?”. In this webinar, we’ll analyze what appears to be a straightforward billing system that needs to deal with invoices and discounts. How hard can it be? Then we’ll add a few more use-cases that make it more complicated and see how that can negatively impact the overall design. Finally, we’ll focus our attention on the impact of time on the design, and see how it sheds light on the correct approach to designing features. And best of all: No batch jobs!

🔗In this webinar you’ll learn about:

  • The impact of time on systems design
  • How to model domain problems with time in mind
  • How NServiceBus sagas and timeouts can help dealing with time


00:00 William
Hello again, everyone. Thanks for joining us for another Particular live webinar. This is William and today I'm joined by my colleague and solution architect Mauro Servienti, who's going to be talking about Got the Time?. Just a quick note before we begin, please use the Q and A feature to ask any questions you may have during today's live webinar, and we'll be sure to address them at the end of the presentation. We'll follow up offline to answer all the questions we won't be able to answer during the live webinar. We're also recording the webinar and everyone will receive a link to the recording via email. Okay, let's talk about time. Mauro, welcome.
00:39 Mauro Servienti
Thank you very much, William, and welcome everybody. And let me start by saying that I'm always amazed by the way my colleagues pronounce my last name, as you may guess by my accent I'm Italian and my last name is quite complex to pronounce, but they are always wonderful in doing that. So, where can we begin then? And let's talk about time and how it affects the design of our systems. More or less 25 years ago in an era without Facebook, no devices, very low technology and if you were there and you are as old as I am now, you might remember that there were mobile phones, but there were those strange things with buttons and very little screens. And at that time my girlfriend and I, we bought tickets for a theater show and we put those tickets on a board like this one, this is my actual picture from a board in my current kitchen and then we forgot about the show.
01:46 Mauro Servienti
We never went to the show. Why? Because there was no reminder at all. So we bought the tickets probably a couple of months in advance, and then it got out of our minds and that was it. And we remembered it a few days after we were expected to show up for the show and we regretted it, but there was nothing we could have done about it. Roughly at the same time, I was working for an accounting. Well, I was working in an accounting department for a manufacturing company. It was the very beginning of my career. And in that accounting department, one of the things we were regularly doing, we had to check for overdue invoices. So essentially at the 10th of every month, we were checking for the invoices issued the month before checking if they were paid or not.`
02:44 Mauro Servienti
At the time, the company was essentially issuing invoices just at the end of the month. Well, it was issuing invoices every day but the payment was a month after the end of the month of the invoice. So essentially, if the invoice was issued on the 20th of the month, the invoice was due the end of the next month. So it was easy for the accounting department to simply sit down and do what we can call now a manual batch job. So essentially, we were sitting down, checking invoices, a lot of sheets of paper and going through one by one and checking the amount of the invoice with the account statement from the bank. And that was the only option back then, there was very little technology to do that. All of a sudden the... Well, not really all of a sudden, but at a certain point, the business evolved and they made the decision to change the invoicing processes.
03:40 Mauro Servienti
And instead of having invoices all expiring at the same time, at the end of the month, the invoice was set to expire about 30, 60, 90, whatever number of days after the date the invoice was issued, that caused the certain flow the invoices. So what happened is that from the accounting department perspective, we found ourselves all of a sudden, too busy. So we were spending days checking for overdue invoices. The batch job concept, that was a human batch job thing at the time was essentially killing the business from the accounting department perspective. Because every day we were sitting there checking for overdue invoices and trying to match them with bank account statements with all sorts of issues, because one of the problems we had was, for example, is their bank account statement up-to-date? So for example, one of the problems you might have is that if the payment is coming from a different bank, there are a few days that the bank takes to deliver the money.
04:51 Mauro Servienti
And so you might see that, well, the invoice is due today, they have paid, but we are not seeing yet the amount of the payment on the bank account statement. So we have to schedule these invoice to be checked again in a few days. So it was put to the side and added to the sort over the next batch job. And even the simple question of, are there any invoices due today? Was complex to answer, because the only option for us was really to go through all the issued invoices and check the date and say, is it due today or not? Is it overdue and stuff like that? So essentially, we were busy doing just that the entire day. And if we look at that from the architectural or the system design perspective, the question we should really start asking ourselves is, what are we looking for?
05:46 Mauro Servienti
If we put ourselves in the shoes of that accounting department, what was the accounting department looking for? Were we looking for an overdue invoice's report or something more straightforward, like an alert when an invoice is overdue? And by the way, a collateral report. So if we have alerts, then we can list all the alerts, store them and say, okay, now we know what are all the invoices, which are all the invoices that are overdue and unpaid. And obviously, if you ask this kind of question, then the only answer you can get is that, we want alerts. It's clear that we don't care about reports from that specific jobs perspective, but we just wanted alerts. We want to be notified that while the theater show is happening tonight, so you better show up, or these invoice is overdue, you better check why it's not paid yet. That was the primary goal, but spending hours going through the list of overdue, of issued invoices. The next question we should be asking then is, what's an invoice?
06:56 Mauro Servienti
And it's very easy to say that an invoice is a sort of set in stone thing that doesn't really change because that's untrue in the end. If you look deeper, an invoice is essentially an aggregate with state and behaviors. So because we can look at the invoice from the behavior perspective, and we can model an invoice like it is a state machine. So we can say, an invoice is in a status, which we can call draft created, an invoice is issued, an invoice is paid, an invoice is overdue, an invoice for an invoice has been issued a credit note, or there might be things like partial payments and so on, or an invoice being re-issued and the payment date changed and stuff like that. So there are many, many different behaviors that might change the state of the invoice and even the content of the invoice itself. But from the accounting perspective, the invoice doesn't change because we know that if we're going to for example, re-issue an invoice, the invoice number would change, but conceptually it's the same payment that we are changing.
08:07 Mauro Servienti
So if we have a state of behaviors, then we can start looking at invoices instead of if they were just documents from the perspective of behavior modeling. And say, well, if we're modeling behaviors, that means that we have some sort of a trigger that comes in, and then we have a state manipulation that happens and let's say events out or messages out, whatever messages means in this context, we're not talking about messages on a queue yet. So, even the trigger could be an HTP request, doesn't mean that it needs to be a message at this point. So let's view this triggering state manipulation events out from a visual perspective. So we have initial invoice sort of command that asked the system to issue an invoice. The system reacts saying an invoice is being issued.
09:07 Mauro Servienti
And then what we need to do from the accounting department perspective is that, we want to know when the invoice is due, or we want to be alerted in some way about the fact that an invoice was supposed to be used today. Is it paid or not? And then when that happens, we want to check for the payment. So the question now is, how do we do that? Because if we try to implement the same way we implemented the manual batch job processor, we'll end up overloading the system, no matter what. So let's imagine implementing that using a batch job, you'll end up having sort of recurring batch job that runs at least every 24 hours and checks for all issued invoices and tries to match them with information that the data that says is paid or not and that's the only option you have.
10:02 Mauro Servienti
And if there are no invoices issued that are expected to be due today, then you'll end up loading the system for no good reasons. So when it comes to modeling these kinds of behaviors, so you have a trigger that comes in, a state manipulation and an event, one very good fit, are messages and sagas. They really help in modeling this. So, how does it work? So let's start by, I assume everyone knows what a message is. So in this case, a message is something on a queue. So it's a message sitting on a queue. So for example, the issuing voice command that we saw before, and we have this message in a queue and then we have a bus as a concept and the persister as a concept.
10:55 Mauro Servienti
So the bus is the thing that takes care of handling messages. And the persister is the thing that takes care of saving, storing, and loading the state that we just manipulated. So for example, one of the things that we can do is that, we have a saga. That is the thing that coordinates these messages and allows the messages to manipulate a specific state. So we have a message coming in, that goes through the saga. In this case, for example, creates a saga, the message is then dispatched and the state gets manipulated. The saga might send messages as in for example, publishing events. So we have a trigger that comes in, a state manipulation that happens, and the message one or more messages that go out. And finally, we have a state persistence so that the state is persisted in a storage and a follow up message can keep to the same state and proceed with the manipulation.
11:57 Mauro Servienti
So for example, there's a new message coming in. At this point, the message is for the same saga that we touched before. So the saga state is retrieved from the storage and the message is dispatched, the state manipulation happens. If the saga is completed, it gets stored or it gets deleted, or maybe there are more messages that happen. So as the saga publishes another messenge and so on and it can go on essentially forever. Now that we have a better understanding of from the high level perspective about what a saga is. Let's try to implement what we just talked about. So in alerting system that allows us to understand that an invoice is overdue using messages and sagas. And let's do that looking at the code immediately, and then we'll dive into the code details.
12:57 Mauro Servienti
So I noticed from the poll that not everyone is used .NET development, the code is very, very simple. There might be things that are strange if you're not used .NET and C#, but I guess that everything is understandable. And if you have any questions, just drop it in the Q and A feature of the Zoom webinar. So what do we have here? We have some code that handles the message, and that's the trigger message. So we have an incoming message in this case is for example, the invoice issued event. So when the invoice is issued, we want to know when the invoice is expected to be due. So what we do is, we store into the state, the invoice number, that's the state manipulation thing. Then we retrieve the due date.
13:48 Mauro Servienti
And if it's an Italian customer, we just add a bunch of days just to be safe. You know, Italians? I know my chickens will say, and then we schedule a delayed delivery. What's delayed delivery? It's a message for ourselves in the future and we're using the due date as the date for the message to expire. What expires means in this case, it means that the message will be delivered to the saga when the due date is due. So essentially, the message if it's an Italian customer will be delivered 20 days after the due date of the invoice and the message it would be delivered is the check payment message. The important stuff here is that, there would be a shared transaction and transaction here is quoted because it's not necessarily an acid transaction depending on the storage type or the queuing system you're using, technical details.
14:46 Mauro Servienti
And there's a bunch of documentation about that if you're really curious in between the statement manipulation and the delivery of the message, why you want a shared transaction, because you don't want to end up in a situation where you have manipulated this data and never published any message or the other way around too. You don't want to have published messages with no state manipulation happening. It means that you'll end up in a sort of corrupted system scenario, and we want to avoid that as much as we can. What happens when the message expires? So when that request timeout comes to the data represented by the due date, here's the code for that. So we want to handle the timeout. There is a message like any other on a queue, that message is delivered to the saga as into the diagram that we showed before.
15:38 Mauro Servienti
And when that happens, what we do is that we say to the world like publishing an event invoice overdue, invoice number. So what we're doing is, retrieving from the manipulative state or the invoice number and we're using that invoice number as the value for the event we are going to publish. And finally, we're self deleting ourselves. So we're marking ourselves as completed. And again, we want to have a shared transaction here. So we want to make sure that if the event is published, we're going to do it ourselves. If we're going to do it ourselves, the event is published. Otherwise, again, corrupted state scenario and we don't want that. But hold on, so you're telling me that whenever the message check payment comes in, we're checking nothing. So if you look at the code here, the code is straightforward.
16:39 Mauro Servienti
It does no check. It doesn't really check if invoice was paid or not. It assumes that if this check payment message comes in, the invoice is overdue. How is that? That's interesting. Since we have a saga that represents the state machine, and I'm saying "state machine" in because it's not really a state machine, it can be seen as a state machine. So given that we have that, we might have multiple triggers hitting the same saga causing different behaviors. So one other trigger could be this one. So we have an invoice paid message that comes in. And the only thing that we do when the invoice paid message comes in is that, we mark ourselves as complete. And what happens when we do that, the saga is deleted. So essentially what we're doing is we're taking the simplest path to solve the problem. We're saying, we issue an invoice.
17:43 Mauro Servienti
Let's say that the invoice is due in 30 days, in 29 days, the customer pays. The invoice paid message comes in. The saga is marked as complete, which means that the saga is completed. But we have a timeout in the queuing system that will expire. The timeout expires the day after and service bus handles the timeout, looks for a saga. It doesn't exist. So it considers the message handled, because if this saga was marked as completed, there's no need to dispatch any timeout or the expired timeouts for the completed saga are essentially useless. So we don't need to check if the message that presents a timeout comes in, we don't need to check if the invoice is overdue and is not paid, that's by definition, because if it was paid, the saga would have been deleted. So let's go back there for a second.
18:41 Mauro Servienti
So what we can now do is essentially shortcut and remove the when due. So because the check payment message is a timeout. That is what we call a delayed delivery. So essentially, as we said, it's a message that is sent back to the queuing system. And we kindly ask the queuing system to, can you deliver this back to us in a few minutes, a few days, a few months? It doesn't really matter, we can set that. And depending on the queuing system you're using, it might be a built-in feature. So for example, Azure Service Bus natively supports delayed deliveries or, in RabbitMQ we built that on top of RabbitMQ, in SQS, on AWS, we built that on top of SQS. But anyway, the feature is transparently supported on any transport that we support.
19:36 Mauro Servienti
So the interesting bit I want to spend a few minutes talking about here is that, what did we do? We built the auditing system, if you think about it. So if you look at this from the perspective of the accounting department, we can now stop looking at issued invoices. The system will notify us when one of the issued invoices is not paid. That's the same thing as buying theater shows tickets and forgetting about them because someone, today my phone will probably remind me a few hours or before the event, you'll have to go to that event. That's the same behavior, essentially, with buying tickets, with setting in the calendar a time-out or a reminder for ourselves, that at a certain date and time will trigger and will tell us something is going to happen. Or in this case, something has happened. The invoice is overdue and not paid. Let's have a look at a more complex scenario. Let's say that you have any common system, whatever, you're selling something to someone, and then you're wanting to use the concept of a premium customer.
20:54 Mauro Servienti
And what premium customers get is that, if their monthly running total is above a certain threshold in this sample, $300, they get a 10% discount. So let's dissect a little bit the requirements. Is not that if your last order has been more than $300, is even the last month, the total of the orders you placed and you adopt them, it's more than $300. Then you'll get a 10% discount. It's a sort of a sliding window. So as soon as that amount goes below $300, you're not a premium customer anymore. If you go above $300, you're again a premium customer and so on. So a very naïve approach to implement this would be something like this. Let's say that we're placing an order and then we want to calculate a discount and we have the customer ID.
22:02 Mauro Servienti
So what's the last month, is the time now other month minus one. So we go back in time by 30 days or 31 days, depending on month. And it doesn't comply because it's missing at portal, anyway. And then we do a sort of a query. In this case, the query is expressed using LINQ, but that doesn't really matter. So we're querying orders on the database. That db.orders is the database table for people not used to LINQ queries. And then we're saying where the customer ID is the customer ID that comes into the meta data and where the order timestamp is greater than last month. And then we're summing, we're adding up all the amounts. If the last month total is greater than $300, then you get 10% discount. Otherwise, no, you don't. Sounds easy. So, and despite the fact that we are hitting the database for every single order that gets out, it works some way. However, the business comes in and it evolves again, those stakeholders should really stop doing what they do.
23:20 Mauro Servienti
But anyway, and the business suggest that we want to introduce corporate accounts. So instead of having your own user account in the system, now you have a sort of a business account and multiple sub-accounts that can buy. And the policy, the premium customer policy applies to the business account. So if I'm a private account, not a business one, then I'm a single sub account, which means that all my order, the order that applied to the policy, the policy apply to are the orders I placed. If I'm a business account or the sub accounts can issue, we can place orders. And whenever one of them places order, all the orders placed by all the sub-accounts in the business account concept should be evaluated for the premium customer policy. So what happens then that all of a sudden concurrency becomes a problem.
24:26 Mauro Servienti
So let's say that order is placed for business customer 123 by account A, by sub-account A and the total amount is $100. The discount is calculated, prime total is $250. So let's say that this is less than 300, so no discount is applied. Concurrently, what happens is that for the same business account, someone places an order for 60 bucks, discount is calculated, running total is still $250. So it's a still less than $300 and no discount the again And that's wrong, because the second one should have gotten the discount because the first one placed an order for $100. What's happening here, concurrency kicks in and creates or causes stale rates or ghost rates. So what's happening is that this query is reading data that are not updated. So what comes back from this query is the wrong amount.
25:34 Mauro Servienti
That's a very naïve approach to concurrency these days and a terrible one, sorry about that. But we can create the transaction and the transaction must have a serializable level. And then we can open a connection through a database, set the database to use the transaction. Run the same query, commit the transaction and boom, deadlock. So as soon as the system gets some load, essentially the pessimistic locking approach will cause deadlocks because we'll end up opening serializable transactions and queuing up requests for the same business account waiting for the concurrent order that just came in a microsecond before the second one to be completed. And then again, again and again and again and the problem obviously is exacerbated by things like the Black Friday. So as soon as you try super overload in the system, then there's no way you can survive using the pessimistic locking approach to concurrency.
26:44 Mauro Servienti
One option again is batch jobs. That sounds like something viable. So what can we do? We can run a batch job on a schedule. Run over all customers, sum all the orders amount for the last 30 days for old customers, business customers and non business customers, in this case. If the policy is matched, create a discount coupon. So the batch job is responsible for creating a discount coupon, which at first it's not really what the requirement was because the discount coupon will be applied to the next order, not the one that is currently running. So if we're placing an order now and we're already above the threshold of $300, but the batch job isn't done yet, we don't have the coupon and we'll end up paying the total amount and not getting the discount, even if that's not correct.
27:46 Mauro Servienti
And again, good luck running the best job during a Black Friday because the system is already overloaded and the batch job will overload that even more, which is not really what we're looking for. So and then discount coupons are generated only when and if the batch job runs and that's an important problem. So if something goes badly with the batch job, then we'll end up violating the policy even more simply because we are unable to run the batch job, which in the case of the overdue invoices might not really be a big deal. Because we can say okay, we've checked for a million devices tomorrow, who cares? It's not really a big problem. But in this case, customers might complain because they say, well, my total amount was above $300, why didn't I get a discount? Oh, sorry, because our batch job approach failed. So let's have a look again at the timeline because it seems that again, there's a time component here.
28:57 Mauro Servienti
And it's clear from the fact that we have a sliding window. So we have a total amount of orders that is moving as time moves and let's try to kill the batch job. Again, we have the same behavior as before. So we have trigger in placing order, some state manipulation, the creation of the order and something else probably. And then again, events out. Let's view it, it's a bit more complex than the one before. So we're placing an order and maybe the system raises an event that can be called Order Placed. And when the order has been placed, we want to calculate the discount. And at the same time, we want to add the total amount of the order to the running total for that specific user or business user. And only once those two things are done do we want to process the order. And then what we want to do again, applying the logic that we applied for your invoices, we want to schedule a message for ourselves to deduct from the running total within 30 days.
30:06 Mauro Servienti
That's interesting. So instead of trying to calculate every single time, what's the total amount for specific customers. We are simply adding and removing values or the totals from a running total. Let's view it from the arrow of time perspective, because it's probably simpler to understand what's going on. At T0, we have the user interacting with the system and placing the order, fine. At T1, we have the things that happen, calculate the discount, process the order and add that to the running total. At T1 plus 30 days we deduct from running the total using a timeout. So essentially what's happening is that, if you think about it for every user that exist in the system, there will be a never-ending saga that keeps track of the running total for that user. That when it starts it's at zero and might be at zero until the first order is placed, when the first order is placed, the saga keeps track of the total amount.
31:21 Mauro Servienti
After 30 days, it deducts from the running total, the total amount. So it might go back to zero. Or if another order was placed in the meantime, it calculates that. Let's have a look at the code again. So whenever we place an order, we have the trigger message that is order placed and then we manipulate the state. So we say, okay, let's add to the monthly running total, the total order amount for this user, because the saga is now bundled to the user. And then we can say, okay, process over there, where order ID is the order ID that came in with the order placed message. And the discount is the discount that they calculated at first line, when we calculated the monthly running total based on the current month monthly total. And then finally, we request a timeout and in the timeout message that we're sending out, we're setting the order total amount we want to be deducted in a month.
32:28 Mauro Servienti
So, and again, as before we want these three things to happen in a transaction, because otherwise we'll end up being a corrupted system. And when the message expires, so in 30 days, what we'll simply do is that we deduct. So 30 days later, we deduct from the system the total amount of the order for which the deduct from running total message was scheduled. We don't really care what the order was, except maybe for logging purposes, but from the saga perspective, the only thing that we need is, how much money should I deduct from the monthly running total? What happens in this case then if there's a concurrency. So let's imagine now that it's the Black Friday, or we have concurrent orders for the same business account, and let's view it from the graphical perspective, because it's easier to understand what's going on.
33:24 Mauro Servienti
So now we have two sagas, so what's happening is that a message comes in for the saga on the left and then other message is handled by a saga on the right. Why can this be? For example, one reason might be that the saga on the left is triggered by the user A for business account 123, and the saga on the right is triggered by user B for business account 123 but it's the same saga instance. Because it's the saga for the business account 123, but given that there are two concurrent orders, there are two messages handled concurrently. And so, the two saga instances are hit at the same time. They are loaded from the storage. The message is delivered to the saga, and then both will try to persist using the business ID 137 in this case, or in the sample that we had 123. One of them would fail.
34:27 Mauro Servienti
And in the simplest case, it will fail because of concurrency control. So an optimistic concurrency exception will be raised by the storage because someone else already touched the same record before. So the second one, the one on the right, is trying to save some stale data and the message for the second one, is tried. And what's happening is that, now it gets the right amount. So we are not anymore in the case of having a wrong total amount calculated by the system and not issuing a discount specifically for that specific order, because when the message is retried, it simply delayed it little bit but the data was not staged anymore. And we're not using any transaction, any pessimistic locking in that time to lock the data and then overloading the system even more. So what are the takeaways that we can take away from what we learned so far is that, the first thing is that we want to try to avoid as much as we can the technical solutions.
35:41 Mauro Servienti
So it's very easy to say, we have a list of things we need to check. Let's apply a batch job, or let's apply a batch job kind of mindset to the problem and go through that and essentially end up with a batch job. The other thing we want to do is really understand the real business needs, because one of the problems we have with domain users or business users is that they come with solutions. They will come and say, we want to have a list of not paid invoices. That's not really what they want. They want alerts for unpaid invoices, or they want to be able to apply discounts to specific orders. And the fact that they are coming with the sort of prebuilt solution, because maybe that's the way they've ever done that, or from the beginning of time they were doing that in that way doesn't really mean that it's the way we should implement that.
36:41 Mauro Servienti
So be mindful about that and do not fall into the trap of listening too much to them, try to better understand what's the real business problem behind that. And then the third one is that, we want to be able to model time by sending messages to our future selves. So one of the things that will happen have when we're dealing with time is that, we can model time by using delayed deliveries and make sure that something that should happen in the future or something that we expect to happen in the future will happen, or we will be able to react if something doesn't happen. Let me give a sample about that. So let's imagine that your system needs to ship goats. You're selling stuff online and you need to ship them. And then shipping is done using a third party provider, FedEx.
37:35 Mauro Servienti
So your system tries to use the FedEx HTTP API to schedule a pickup. So I buy something in the system and then your system picks the things, the HTTP API by FedEx schedule a pick up. Now, how does the system react to the fact that no one is showing up? That's the timeout. So what the system can do is hit the FedEx API and then schedule a timeout for itself. Let's say five hours later, the expected time for the delivery courier to show up. And if the delivery courier shows up a message will come in saying shipment, pick it up, which will complete the delivery saga and then the timeout will be ignored. If the delivery courier doesn't show up, then the timeout will be handled. And it means that we have a problem, and now we need to escalate the problem.
38:36 Mauro Servienti
But we have a message coming in, that is again, a trigger and we can use that message to manipulate the state again, and maybe send a message to Mauro saying, your delivery will be late and then raising an event internally to speed up things and try to understand how to overcome the problem. So we can model everything that is related to time using delayed deliveries. And these are the three main important takeaway I'd like for you to take away today. And thanks everyone for having joined me and my colleagues today. My name is Mauro Servienti as William said, I work for Particular Software. There are a few contacts. You can ping me, and thank you very much. Demos are available at the first link, slides I used for the presentation already available at the second one and a tutorial about using timeouts within service bus sagas is available at the third link.
39:37 Mauro Servienti
And I think that there are a few questions. There's a question from Philip, how is a saga different from an aggregator? It's not. Implementing aggregates using sagas is a very good way of implementing aggregates in a distributed system. One could argue that aggregates and distributed systems don't fit really well. But if we want to implement some domain-driven design patterns when building distributed systems, sagas are a perfect fit for that. One of the "side-effects" of using sagas is that they force us to go to some CQRS kind of approach because there's no way, no easy way, because we built that intentionally that way to read the saga state from the saga storage. So there are ways to do that, but we try to avoid that because the saga state is private to the NServiceBus impletation, technically speaking.
40:49 Mauro Servienti
So we advise customers, we warn customers that it should not be done. So what the saga can do is essentially publish events that can be used by the system to build up great models that are presenting the current state of the saga for a UI, for example. Then there's a question from Michael, what are you suggesting on thoughts on how to integration tests, but the way this saga time outs in a testing environment? That's a very interesting one. So the NServiceBus framework comes with a testing API. So, and it allows you to build unit tests for sagas. So, sagas are just a class like any other. So there are no special requirements for a saga, and then you can unit test that saga. And given that the messages are just classes, you can mock messages, mock sagas if needed, they'll just create instances of those, and then exercise the saga itself.
41:53 Mauro Servienti
It's tricky though because the unit test doesn't take into account the real system. So the unit test runs in memory. So there will be no queuing system. And what happens if the endpoint that is hosting the saga is for example, mis-configured, or what happens if a timeout gets delivered somewhere else. I'm just inventing something. So what we have, there are integration options as open source projects. One of them, I wrote it, another one is written by Jimmy Borgard that allows you to host the real production saga code and run it in a unit test against the real queuing system. And so you can do proper integration testing of the real saga scheduling timeouts. One of the tricky things at that point is that, you can't really wait for your timeout to expire in 30 days.
42:59 Mauro Servienti
So one of the things, for example, the toolkit idea, open source toolkit idea, allows you to do, is allows you to inject rules to reschedule timeouts. So your code doesn't change, your saga code doesn't change and we'll try to do, for example, using the samples we showed before. We've tried to schedule the check payments timeout for 30 days, but then the testing code can intercept that and say, no, instead of 30 days, do 30 seconds. So that the test will complete and doesn't time out. I hope I answered that. So there's a third question that there's a, what if we need to reschedule a timeout, for example, either invoice payment terms change the after it's been issued, that's one case. So rescheduling timeout is not directly supported by NServiceBus sagas.
44:00 Mauro Servienti
It's not that complex though to build that feature by ourselves. So what you can do is essentially do something like this. You schedule a timeout and given that the timeout is just a message. What it can do is that you can, other like we did for the running total timeout message, you can add properties to that message. So one of the properties you can add is an identifier. So you can create a global unique identifier, set that global identifier to a property on the message and store it in the saga data. So that is the state of the saga that you schedule the message in ID XYZ. The one identified by the global unique identifier for a specific time, 30 days. Then if the message has to be rescheduled, what you can do is essentially change that value in the saga data only saying, okay, now this message is supposed to be in 45 days.
45:06 Mauro Servienti
So when the message comes in as expired, your code instead of just doing the business thing you should be doing, checks in the saga data if the time expiration time still matches the one expected. And if it doesn't, instead of running the business code, it will reschedule itself. So in the sample of the 30 days, it will come in, checks and, oh, it says it's not any more 30 days. So it's now 45. Let me reschedule myself for 45 more, 15 more days, sorry. It's a little bit trickier to reschedule something for shorter. So if from 30 days we want to move to 20 days. So what we can do is essentially create another message with a different ID, and linking the two IDs, sending the message out when the first one comes in for 20 days, because it will be earlier than then, we mark that as checked.
46:08 Mauro Servienti
And we mark the first one as handled too in the saga data. At that point, when that comes in, if the saga is not completed yet, we can check the saga data, the list of requested timeouts and say, okay, we don't need to handle this anymore because it was rescheduled. And given that this has been already handled, we know that it was rescheduled at an earlier time. It's sounds complex, but it's not. It's just a matter of keeping track of the timeouts you request, and then knowing what to do. And it is just a business kind of problem because the business dictates, okay, when this happens, we want this thing to happen and that's it. And just model that using sagas and again, messages. And okay, there's a new question from Diego. If I currently for a specific process, do I need the aggregation because I can send the order that I need for decision with the right message?
47:08 Mauro Servienti
Should I build the saga anyway? Not really. So the question is related to the fact that NServiceBus has the concept of delayed messages, delayed deliveries that are not necessarily bundled to sagas. So in a regular message handler, when you receive a message and the receiver of the message, the handling code of the message is not a saga, it's just a regular message handler, you can delay a message. So you can create a new message and delay it or delay the incoming message, that just works anyway. So you have an incoming message, let's say that you realize that you're not ready to handle that message. You can reschedule that message for later, let's say next week, and that you don't need a saga for that. The only use case for which you really need the saga is that if there is a state manipulation and there might be multiple messages as multiple triggers coming in that should manipulate the same state.
48:11 Mauro Servienti
At that point, the saga is built in NServiceBus is designed exactly to do that and provides you with facilities to manipulate at the same state without worrying about all the correlation problems with multiple messages coming in. Yes, it's a good implementation anyway. So if you just have one single message and you want to delay data and there's no other messages that need to be correlated, it's a perfectly valid correlation, I guess.
48:53 William
Looks like that's all the questions we have today. On behalf of Mauro, this is William saying goodbye for now. We'll see you at the next Particular live webinar. Thank you everyone.

About Mauro Servienti

Mauro is a solution architect and former Microsoft MVP (2006-2016). He spends his time helping developers build better .NET systems leveraging service-oriented architecture principles and message-based architectures. He's also passionate about skiing, classical dance, and music in general.

Additional resources