The Unsung Hero of Modern Software: Asynchronous Messaging

00:00:01 Irina Dominte

So, we should start. A few more people coming. I was thinking that I should do a joke and I found exactly the perfect joke. Are you stuck here since yesterday because you couldn't find your way out? Actually, I was there here and on the way out, I was like, okay, did I took a left or a right? And it took me a while to find my way. So, I thought about maybe no one will join my talk or they will be stuck here because they couldn't find their way out as I did. So, welcome.

Today, we're here or I'm here at least, to talk about asynchronous messaging. This has been like the thing that I do in the last 10 years, ever since I found out that there are easier ways of doing microservices, intercommunication pretty much. And back in the days when microservices were a thing, the new shiny thing that everyone wanted, the first go-to option for splitting the monolith in small chunks was to do HTTP requests. That's the usual step to evolve your architecture, right? So, since then, I discovered there is another word out there that has different concepts and it is kind of hard to unlearn what you know from HTTP, when you start to embrace asynchronous messaging.

So, who am I? My name is Irina. I'm a software architect at Particular, and I blog from time to time at Irina.codes. And I also, I'm a Dometrain author. You can find a course on messaging, of course, on MassTransit over there. So, since we're talking about microservices and this journey around communication, we cannot not mention the monolith. This is where we all kind of started and some of us, they're still there even today. So, to me, I must admit, as a developer, the life was easy. I mean, you had everything in there, all the dependency, everything, one place, you searched in the code and you didn't need to worry about what others are doing or what other teams are doing, merged conflicts, because you had one single huge source code, but that's it.

So, I'm going to admit, I was guilty. I am guilty, I love monoliths. I still love them. They still exist, by the way, in different systems, in different companies. So, a monolith is self-contained. We have that single code base, and there is a single deployable unit, that makes life easy for developers. All the dependencies are in one single code base, there aren't too many moving parts, and usually this is a single technology stack. We all know that the goal is to have polyglot environments. Maybe just in case we would want to switch specific component to, I don't know, Rust or Go or Python, we would have that flexibility. In real life, actually, I have never seen it. So, if you have a tech stack that is .NET, you most likely would have every single component in .NET. I mean, I haven't seen this transition.

So, with the monoliths, we have all or nothing deployed. We have green-blue deployed, but you take the chunk, you move it in production and you hope that it works. We kind of have downtimes or if we do not want to have downtime, we have to intervene with different system in between. Almost zero continuous delivery. Since the code is so big, when you do many changes, you have to also test those changes. So, it's hard to test, right? But, when it comes about scaling, it is possible, you can give a bigger machine, you can give RAM and CPU power, we can scale it up. But giving more resources to a machine doesn't necessarily would increase in terms of costs proportionally. Of course, we could add many instances of the same application, but we would have to put low balancers in between and we would create things more complicated.

Well, so we heard about microservices architectures. What is the microservice? How big is it? Nobody knows. Every company has its own definition about what the microservice is, and we kind of understood this term, how we wanted it. So, it has its own database, a microservice, right? It is easy to deploy, is a standalone thing, and it is easy to maintain. So, they promote it as being easy to maintain, right? Because it's isolated. But is it that easy to maintain? When you chunk your monolith into smaller pieces that start to move around, is it so easy to maintain? As easy as it was with the monolith? I don't think so. Right? Because, we have so many moving parts now. We get complexity, more complexity that we would want, because we have more moving parts. And unfortunately these moving parts cause cascading effects in case of failures, because we depend on different other subsystems in our app.

We need to monitor them closely. And monitoring is a thing that we do from the moment when we actually need it. Nobody thinks about, oh, we should add monitoring in our system. When the things get hard, we think, okay, from now on we're going to add monitoring, so we can prevent failures in the future. So, as soon as you start chunking the monolith and separate the components, we will start to realize that independent units are not so independent anymore, and we end up with things like this, big ball of mud, of big ball of HTTP calls, at least at first. And we end up with things like this, that do different things or different things like this. Independent, right? With their own database, it's easy to deploy. Teams own their own microservices and so on.

Because, we thought that if others do it, for example, Amazon, calls sometimes 150 APIs to build a single page. They could do it, right? Usually the tech giants dictate what we want to do with our software. Where Netflix that services five billions requests a day and from all these, 97.7 are internals. So, downstream APIs. They can do it. Why can't we? Right? So, we do HTTP calls. An HTTP call to another API is super simple. We do requests, we get the response and we're happy, most of the times. Or this is the, let's say, the default scenario that we have in mind when we code things. But how do we proceed if things are not so good anymore and our response doesn't come back in timely manner? We're sad then, but we also cause cascading failures inside other components that needs this. So, what do we do then?

Well, we manually intervene and we introduce retrying policies. So, in case next time maybe things can be solved by themselves, right? Well, sometimes it's not enough, because API 2 won't even know what was the content of the request, right? We retry, we retry. But from the API 2 perspective, the API doesn't know what was the content. So, we lose data. And from HTTP synchronous calls, we start to move on and say, oh, I'm going to do it async, it's faster, we serve the request faster. But in fact it's still HTTP, it's still async. It's async for us in C#, because we use that syntactic sugar and stuff and we have threading and stuff. But in the end, the protocol that we use is the same. HTTP by its nature is synchronous, no matter how much we're trying to lie, no, it's synchronous because this is the way it was built.

So, it might look like we're distributing the load, because we're doing requests and those responses may come back. But we actually have the same issues. These appearance of serving more requests is just that, an appearance. It doesn't happen. The server you're hosting the API on, is the same, the problems you're facing are the same. About HTTP, TCP connection for each request, we have absolutely no retries out of the box. We have to use libraries, we have to think about different policies that would save our requests. What happens if, I don't know, an API doesn't respond in three seconds. Should we retry? Should we cancel? Is that a timeout? Right? What status code and what exception do we handle? So, we have to think about all these aspects.

Another thing is that HTTP doesn't have any delivery guarantees. You do the request, but there is nothing guaranteeing you that the request will be received. So, you'll receive a response back. There is that happy case where you do request, you get the response back and that's it. But in other cases, there's nothing guaranteeing you that the request ends up being on the server side where it should be. Another thing is location transparency. You have to manage every single endpoint that you are calling. You have to discover those endpoints somehow. Either manually you are coding them somewhere, app settings, in production environments and stuff, environment variables, right? But you know exactly from where to where you're doing the requests, so you have some extra things to manage.

But HTTP is super good for public-facing APIs. If you want to expose your API to the outside world, usually REST will be the de facto standard or JSON over HTTP pretty much. We're familiar with this. We have been using HTTP since the internet appeared and it is easy to debug because we're so familiar with it. But do we ever think about these stuff? Non-functional requirements like availability, fault tolerance, latency, throughput, reliability, observability, resiliency, scalability, interoperability, recoverability, and you name it, pretty much everything that ends in ility, it's something that we might want in our system, right? And I'm going to ask you, how many of these you're having in your systems?

Like checkbox, I have full tolerance, I have availability 99.99%. My system is resilient, it is able to self recover. Do we? No, we do not even think about all these things, if they're not explicitly mentioned in the feature that we have to implement, right? There might be some cases when the business analyst or the team manager comes and say, Hey, I need this to be up 99.99% of the times. And then you start to think, okay, how do I achieve this? What do I implement? What extra tools do I use? How do I ensure that my system is up and running and it is recovering from errors or retrying things? There are many of them and unless they explicitly stated, we do not care about them. Pretty much we're kind of focused due to the time constraints or I don't know, to what we have in mind, functionality features. Because those are the things that usually bring us money in return from our customers, from our stakeholders. They need those features and these things are treated as a secondhand stuff.

For example, in case of e-commerce system that needs to handle a lot of orders during Black Friday, when you explicitly have a thing like this, then you kind of think, well, we have to handle load, we have to scale. But other than that, we do not think about having system that auto-scale, unless we have cloud and that's enabled by default. And we don't think about having system that are elastic, to be able to handle load when we have peaks and to decrease the number of instances and stuff like that. It's aperational, so to say, right? So, another thing, how do we handle timeouts? We do request, we wait for responses, but for how long do we wait? It's, who gives us that number? Hey, 30 seconds, one minute. And there would be 1000 other requests that wait for us and they will get timeouts. But there is a glitch, something happens in the systems and we're not able to respond. How do we handle that? And what happens to all the requests? That's the thing that we should have on the back of our minds.

Some other things that are very closely related to non-functional requirements or the quality attributes of our system, is coupling. Coupling can be of many types, but I'm going to mention two. Temporal. So, if service A needs to talk with service B, well, they need to be up and running, able to chit-chat with each other. And this is the easiest thing to solve. Another thing is logical. As soon as we start in our code to duplicate things like, hey, this API needs this model, order model and this other API needs to be aware of that order model somehow and we start to create another class and we end up with 10 order models scattered across different components of our system. That's a logical coupling.

And sometimes, we think that, okay, how about we extract, we do some refactoring, right? We're doing HTTP clients that we start to distribute using NuGet packages. The deal with the HTTP clients that we start to distribute, because we think it's okay to do so, is that they tend to gain weight and they tend to gain business logic. It's one thing to code some information about that, some tests in a package that you distribute, but it's another thing to create spaghetti code just because you introduced some business logic there, and it usually happens and I've seen it several times. So, duplicating business logic is not a good thing and it's something that we should address. And the road to a good system is not easy. It is sometimes very curved. So, we kind of agreed that we do HTTP calls and we do REST. And I'm going to ask you, how many of you are doing REST APIs? Okay.

And now how many of you are really respecting all those six guidelines from REST, that Roy Fielding... Going to hide? Right? We kind of choose from REST, only those bits that interest us. Right? We do not use, not even the good verbs. How many of you are using head as a verb? Yeah, I've seen only one hand. Right? Head is like a get but doesn't bring the body back. That's very useful when you rely on custom headers or status codes like hey, check if this item is there or not, right? You don't care about the body, but you do care to see the existence of a specific item, and it has many applicability around industry. But REST is simple, widespread. We all know JSON, we do interoperability with JSON or XML. But at the same time, we know that REST has a limitation. We do have tight coupling.

As soon as we know the address of that thing that we're trying to call, we have a coupling introduced in our system. We know the endpoint, we know the URL that we want to call and we know where that API lives. That's a coupling issue. Another thing is latency sensitivity. High latency or network disruptions can cause timeouts and of course cascading failures. And not only once I've seen with different teams that we have service A calls B and B calls C and we have this chain of requests, because, I don't know, this is the way we inherited the system or this is the way it is, and we have to deal with it. But if C has a problem, then B will have a problem and A will have a problem. And that's the cascading failure. And another thing with REST is that we get limited communication patterns.

Do you know anything about, besides request response? With REST, we do a request, we get a response. The communication pattern involved is request response, right? That's it. There is no other thing. If we do a HTTP request to an API, you have no way of, at the same time doing the same request to the second API or to the third, a copy of the same request, it's impossible. So, we only get this request, response.

With messaging... Because this is the thing that I'm trying to preach about, right? What is messaging? Well, messaging tries to give us loosely coupled integration. Meaning that, as soon as we publish a message or we send the message or as we would do in HTTP, do a request, that request in messaging transform itself into the concept of message. That message ends up somewhere in a system, and from there, it is picked up or pushed to those components inside our system that care about that.

Irina Dominte

Also, as a result of a order saving, we can publish an event, an order has been created. And what you'll see here is that, nothing gets sent or created in the message broker. And that's because MassTransit only creates queue as soon as you attach a consumer to it. Otherwise, it'll appear as nothing happened, didn't work, right? So, let's go ahead and start this notification worker, that is interested in an event that happened in the system. So, I'm going to do the same. So, now I have two consumers that don't care about each other, they're attached to a queue inside the message queue, inside at the end of the queue. I'm going to finish this and I'm going to do another request. Okay, let's see if this is working. Great. And another thing, I think the API is not running. Just let me check how many I have.

Create order consumer and notification, order created consumer and the API. So, I can use Postman to create my orders. But this way, I can attach as many consumers I want, as many consumers can be interested in a specific event that happened and I can react accordingly. So, let's see, we save it, we publish the order. And now when we are looking at RabbitMQ, suddenly there is this order created notification. The naming can be controlled by you or you can leave it to the chance by the default naming convention. So, if we look in here, we'll notice that the notification order received something on order with an ID 14. And in here, there's a second message that was processed. I'm just writing something in here because I didn't auto-assign an ID just to mimic the things. Okay.

And another thing that I think I'm going to have enough time to show you is the request reply. That can be done over queues too. So, we have an orders controller and you'll see here when you download the code is that, you have a get order that mimics how an HTTP response would look like. So, it calls the client that listens on the messaging queues and it'll return back an order result if the order is found or an order not found result, if the order is not found. So, I'm going to comment up the old code. This is traditional REST, not found, and OK with that order. When I run the API, I also have here a consumer attached, hang on, where are you services?

I'm going to just verify order. I don't think I have a consumer created yet. You'll notice here that you have a request client just as you would have with HTTP, you register it in the pipeline and you'll see it there. Request client. Now apparently I didn't implement the consumer yet. I was about to, but I'm going to push the code. Anyways. It'll behave the same, the sender, the client basically, the consumer of the API will get okay or not found with an error message. So, this can be mimicked. What happened in the broker is, another temporary queue, with the default expiration time will be created every single time you do request through that. So, the response comes back through that error queue.

Okay. So, what I'm going to do is to show you the recoverability part. I'm going to start a Docker instance with a tool that I think is very important to have. I have a question. Are you using messaging already? Awesome. With what? Azure service bus?

00:56:11 Speaker 2

ActiveMQ.

ActiveMQ? Wrapper around it like MassTransit, NServiceBus?

00:56:17 Speaker 2

Brighter.

Brighter? Okay. It takes a while, so. I tested it last night and the images are pretty large, the internet is not that good and it took a while. So, some elevator music would have been nice. Can you help me with that?

But some of the things are started, some of them are not. But in any case, if we want to talk more, you'll find me by the Particular booths today. So, I'm there if you have questions or curiosities. These are all green. So, in theory, just in theory, practice, we know that sometimes doesn't work. So, you'll notice a bunch of things. So, you'll see bill.order_error. MassTransit does create a pair for our initial queue in case of an error. So, that's the concept of a Dead Letter Queue, pretty much. Creates a pair where we will store all the errors that happens in the system. So, not lost over there. But the idea is, as soon as your number of queues grows, it is kind of tedious to do the work. So, what Particular did, was to create a showcase where you can... This is public, so you'll find it linked in my samples.

So, you run the scenario and a bunch of orders for different funky fruits are sent. But some of those orders are not processed with success. So, we're seeing an order with this order ID, has mangosteen and whatever fruits are those, failed. So, the sales process failed. And of course the billing process failed. You'll see here, what is the flow of messages pretty much. So, client doesn't order, so the sale component gets in, publishes a message. That message is interesting for the billing component and for the shipping. And it's create an invoice or take the payment and also arrange the shipping. But, as soon as we see errors, we can click here and see failures and we'll see information about the errors happening in the system. And this is a thing that is not available out of the box. You either implement it in your company, spending a lot of time or you use this tool.

So, you get here all the failed messages, you'll see these grouped by the exception types or by the endpoint address, instance, name and so on. So, everything is filtered in here and you can inspect them individually. For example, the type of message that failed is order placed, happening in a specific endpoint. We can see the stack trace that happened, we can see the headers and the message body of course. And one of the most important features is that, you can edit and retry. So, you figured out, hey, an error, an order ID, something is not correct, right? You can go in here and edit whatever you want to edit and you click retry and that message will be gone to the initial source that was supposed to process it. And this makes it very easy to deal with errors.

Also, if you want, you can retry all. Select them all and retry all. And why not delete them? It depends on how you want to treat your errors, gracefully or not so gracefully by deleting them. But this is a very important feature that we do not have around the ecosystem. So, give it a try. It's in the early access program, so tell us what you think about it. And as I told you, you can find me at the Particular booth if you have questions.

Key takeaway for today, use the wrapper on top of the infrastructure code, because it'll save you hours. So, no matter the abstraction, use it, find one and use it, right? Whether MassTransit, NServiceBus, Brighter or others that suit your purpose, use the abstraction, because it's very useful.

Okay. So, keep in mind, HTTP is not the only option. You can find here a bit.ly that points to the repo with the code and also the slides and the showcase with the recoverability thing that I showed you. And don't be shy, come by the Particular booth and let's talk. Thank you for attending. I hope you learned something.

The Unsung Hero of Modern Software: Asynchronous Messaging

About this video

🔗Transcription