The route of a message
About this video
This session was presented at Update Conference Krakow.
Regardless of industry trends like microservices or structured monoliths, asynchronous message-driven systems are on the rise. This can be attributed to the ever growing need to break down complex business processes into a series of small steps connected by a reliable communication mechanism, that can be traced back at least to the landmark Sagas paper by Hector Garcia-Molina and Kenneth Salem.
While these systems inevitably grow and evolve, the key challenge of engineers becomes routing of all the messages that glue the business processes together. In this talk I’ll attempt to outline some of the biggest and most common issues and suggest common patterns of dealing with them.
Finally, we’ll discuss a model for thinking about asynchronous message-driven systems that allows you to make sense of all the routing complexity stemming not only from the business itself but also from all the cruft that you, the architect, have to deal with, such as multiple hosting environments, legacy message brokers, rapidly changing requirements etc.
đź”—Transcription
- 00:41 Szymon Pobiega
- Okay. Hello. I guess more people will be coming soon, but let me start because the time is limited here. Welcome to my talk, the way of the message. My name is Szymon Pobiega. I'm based here in Krakow. These are my contact details here, so I won't spend more time talking about myself. We'll be talking today about pierogi to promote, I guess local traditions. I think we'll be today talking about a fictional company and a fictional development team, engineering team whose task was to build software to be most successful pierogi delivery company in the world. And the story is entirely fictional, but it is based on my 10 years long career of helping people building distributed systems with message driven middlewares.
- 01:39 Szymon Pobiega
- So, all the problems that the team encountered are real, just not mentioning the names and the technologies because of obvious reasons. But before we go into the fiction, I would like to spend some time to set the stage by going a bit into the past, entirely real past of software engineering and explaining how the team of the pierogi company happened to be aware they were when they started. There are fictional journey, so this is more or less how people used to implement business processes in the '70s or '80s with cobble and tools like that and mainframes and terminals.
- 02:24 Szymon Pobiega
- The business processes back then were mostly driven by human beings who were executing procedures in their heads and then interacting with keyboards to type in some data. And when they typed some data on one screen and they hit a button and that input that they typed in was transformed into SQL statements executed, or even before that, different database programming, language statements. But anyway, database statements executed against some database and another screen would be displayed to that operator and yet another screen and that could take multiple screens and many minutes or hours to execute. The interesting aspect of that processes was that the entire thing was wrapped in a database transaction.
- 03:15 Szymon Pobiega
- The begin transaction than few hours later commit transaction. You probably can imagine how big problems that cost even with a very small relative to current standards of throughput or traffic in these systems. So, already in the '80s, late '80s, people were starting to see the problems that the transaction cost, especially the isolation aspect of transactions. All the other factors that are good, we want to keep our transactions atomic. We don't want to have partial data updates. We want consistency, durability, isolation we are not sure about, right? Even now, we think when we are starting a transaction with transaction level is necessary.
- 04:04 Szymon Pobiega
- So, people back then in the '80s were thinking, "Well, maybe we're paying that cost for a transaction isolation without the need for it." And here comes the paper from 1987, a landmark paper that I totally recommend everybody to read, Sagas by Hector Garcia-Molina and Kenneth Salem. And in that paper, they proposed, well, maybe we don't need transactions to last four hours. Maybe we'll be good enough if we have a series of short-lived transactions. So, each screen in that example is a single transaction. Now it's not a huge long hours long transaction. And the upside of this is these transactions are short, they don't keep locks for a long time and the throughput can go up.
- 04:52 Szymon Pobiega
- The downside of a naive implementation is of course we lose the atomicity aspect of transaction, right? It's no longer atomic. If we fail in step two, that means that step one has been already executed and committed at is visible in a database and it's going to stay there no matter what happens later. So, not good. So, what Hector Garcia-Molina and his friend propose is to have a series of compensating actions that are parallel to the actual transactions. So, for each transaction TI we would have a compensating action CI. So, when I'm in step two and can't proceed because somehow step two can't execute, I tend to revert my actions can execute.
- 05:41 Szymon Pobiega
- Step C one, the compensating action for the already committed transaction T one that brings my database state back to where it started. So, a very neat solution. People were running that in the wild and they were happy and we can fast-forward for 10 years and see what happens in the '90s. So, in the '90s, well, we replaced the big old mainframes with more cheaper solutions like server racks with multiple servers. The business processes started to be less dependent on humans. So, people we software engineers started to implement more and more of the business logic in code and not in the books that the humans would read and translate into some actions.
- 06:29 Szymon Pobiega
- And business processes started to be autonomous from humans. So, they would run while no human was looking at them and needed some sort of mechanism for keeping track waking up after a few days or things like that. But generally nothing really changed because we still had a series of short transactions. But this time because we needed some more freedom how to develop these processes, people introduced the concept of state. So, more or less, the thing would look like the software was a composition of a couple of or couple thousands of, or a couple hundreds of stored procedures, each of which would try to look up in a database, business processes in a certain state, state equals one state equals two, and so forth, and then execute some actions on it.
- 07:25 Szymon Pobiega
- So, update things, insert that, do things like that, and then commit the transaction. And then the mistake of the process would be updated to let's say state four, and there would be a stored procedure that looks for business object in state four and do the same thing. So, that's how it worked. And let's fast-forward another 10 years. So, we are in the early 200s, let's say. And finally we learned how to build proper distributed processes around that time. I'm guessing that there was a perfect storm of technologies that came together because technologies for sending a asynchronously messages like IBM MQ were there for at least 40 years or 30 years, but by that time, so everything was there.
- 08:14 Szymon Pobiega
- Somehow in these early 2000s, we learned how to combine these things to build proper distributed business processes. And by proper, I mean code-based implementation of a business process that consists of multiple discrete pieces of code that are tied together or chained together by message exchange patterns like request, reply, publish, subscribe or command sending commands. In other words, you can call these distributed business processes sagas on steroids. Why on steroids? First of all, because you can now have a wide choice of this message, distribution patterns.
- 08:58 Szymon Pobiega
- You can have forks in the processor's joints, you can have all the complex control structures, not only something like Garcia Molina envisioned a series of steps that is either going forward or backwards. Then you have temporal modeling that you can build on top of message-driven middleware. Most of the queuing systems out there like RabbitMQ or Azure Service Bus, allows you to send a message and say that you don't want this message to be delivered earlier than specified value. So, you can turn your temporal logic into a message-driven logic that helps to structure that logic really neatly.
- 09:38 Szymon Pobiega
- And then another thing is now with these message exchange patterns, you can send a message in one process and receive in another process. That sounds fairly obvious, but that was a great invention because it allowed people to scale parts of their system, not the system as a whole. And finally, because these pieces of code now had explicit connections between them defined and as messages, events, commands, replies, there was no single place that had to hold the state of the business process without that need for the single place to hold that state. We could have multiple data stores and part of the process could store its data in one place.
- 10:22 Szymon Pobiega
- Another part could store in a different place, maybe in a relational database, maybe in NoSQL database. So, that allowed us great freedom of choice. You can throw it, sketch it like that, similar to the previous example, but now we have the central part that can be repeated over and over again. So, we have a process that starts by modifying some data and sending a message. And then there is a middle part which starts by receiving a message, then modifies some data, sends another message or sends a message to itself to delay some execution. And as you can see, because the input is a message and the output is the message, you can have as many as you wish to build a complex structure. And by accident, this thing looks like a potato.
- 11:11 Szymon Pobiega
- So, I'm going to refer to this in the rest of my talk as a potato pattern. So, that leads this nice segue, culinary segue to our pierogi team. So, we are early 2000s, 2005 maybe. The company started and decided to have to conquer the word of pierogi delivery. So, the development team started by reading a book. This is here, the book by Gregor Hoppe and Bobby Wolfe, a Landmark book from 2003 that defined how we talk about message-driven system. And it's still applicable today. So, I highly recommend reading it. Quite long and a bit dry for today's standard, but well worth it. So, that system initially consisted of three potatoes already labeled, RabbitMQ.
- 12:08 Szymon Pobiega
- Anybody here used RabbitMQ or heard about RabbitMQ? Okay, most of the people, yeah, that is a message broker that has been released around the same time. So, it was brand new when the team was using it. A very nice one if you're using it on premise. And back then, we didn't have anything, but on premise. Each of these potatoes has a data type channel that's a pattern from the book, and that means that this is a queue which can contain only messages of a single type. That greatly simplifies the logic because when the potato on the bottom wants to send the message to that potato on top, it knows that if a message name is X, then it should put it in the queue name X. super simple and worked like a charm.
- 12:56 Szymon Pobiega
- And the pierogi delivery company was able to grab first customers with this software and it outgrew its capabilities very quickly, especially the capabilities of the shipping part of the system. So, what the business had to do is they had to actually acquire a shipping company in order to keep up with the growing demand for pierogi. And the task for the development team was to somehow integrate the shipping subsystem into their solution. So, they started by drawing architecture diagrams, and you can see it's the architecture diagram because you have no idea what it depicts. That's how you recognize proper architectural diagrams. Sorry for all of the, you hear being software architects.
- 13:44 Szymon Pobiega
- I'm making fun of my previous self, so don't want to, yeah. Anyway, most architecture diagrams contain boxes and arrows. This one contains potatoes and arrows just to have something different. But you probably all know that the most important part of the architecture diagram is the arrow, not the box. So, the arrow denotes the challenge here. The challenge is how do you communicate between those potato-shaped things? I need to note here that the one on the left, the one on the left is the shipping part, and it's already a message-driven system. So, that gave the team some hints how they can proceed here. They need to connect two things to message-driven systems.
- 14:34 Szymon Pobiega
- Their initial thought was to use HTTP, but they quickly resigned from this idea because they started thinking about exposing HTTP endpoints, securing HTTP endpoints, doing all the things that are necessary. And on top of that, they realized that HTTP is inherently at least one protocol. That means that you are sending a message over HTTP, and if it's lost, it's lost. There is nothing built into the protocol that will repeat that. So, they would have to build on their own mechanism for reliable HTTP-based communication. So, they tried to enroll an expert on that and get some advice. And the expert said, "Well, you haven't read the book well enough.
- 15:17 Szymon Pobiega
- There are more patterns in the book that you realized not only potatoes, and one of them is called messaging bridge." So, they slapped and messaging bridge on this architecture diagram. It's much better, right? Everything is readable here, not really, but we can make it better by explicitly drawing messages here and marking them with the technology they use. So, we probably can't see it from the end, but it's important that there are envelopes flying here. And the interesting part is in the middle, there are envelopes denoting messages that want to come from one of these big potato-shaped blobs to the other through the bridge, and that allowed the pierogi delivery team to realize that this messaging bridge pattern is nothing else, but a different shape of a potato or different type of a potato.
- 16:11 Szymon Pobiega
- One very specific that doesn't have a database because it doesn't need one. It doesn't have a business logic. It just takes a message from one queue and puts it into another queue and vice versa on in the other direction. That takes a message from the RabbitMQ queue and adds it to the SQS queue. Very, very simple pattern that did not require them to change anything, almost anything in their system. On both sides of the messaging bridge, deployed to production, very, very big success. They were able to get their bonuses and so forth, and it was even bigger success than they expected because the management team realized that now having this shipment subsystem integrated that they can actually start selling it.
- 17:01 Szymon Pobiega
- So, they approached the development team and said, "Hey, we want to allow other companies to ship through our shipping subsystem. It's so good. Can you make that system multi-tenant please and deploy it yesterday?" Well, the development team scratched their heads and thought, "Well, the easiest way to build a multi-tenant system out of a single-tenant system is to basically deploy 10 copies or 100 copies of the system. And here we go. You have a multi-tenant system." The problem though, with that approach is that the team already expected that currently they had four potatoes in the shipping subsystem and they expected that number to at least quadruple in the next year or so because the shipping process was expected to get more and more complex as external companies were supposed to use it.
- 17:56 Szymon Pobiega
- So, try to multiply 12, let's say 12 of quadruple 16 potatoes, and the expected number of tenants was around 100. So, the number is staggering, especially if you take into account that we are somewhere in the late 2000s, so it's at least five years until Docker and Kubernetes are invented and the deployment automation is not there. So, they were like, no, there's no way we can manage that in our production environment. We need to find something more smart. So, they went back to the book because now hearing Yoda's advice, they knew that maybe they can mine that book for more knowledge. They did not find a solution, but what they find is something close enough.
- 18:42 Szymon Pobiega
- There was a pattern called service activator that seemed like a useful thing. The idea behind service activator was that the class that handles messages is a separate class, and there is a separate component called service activator whose only responsibility is to instantiate the class that can process messages. Seemed like a fairly simple solution. So, what they thought is, well, if they can teach that service activator thing to look at the message metadata and try to figure out what type is that message, then maybe that service activator thing can activate different services for different message types.
- 19:27 Szymon Pobiega
- So, it looks like, oh, the message is type A, let me instantiate a class that can handle messages of type A and so forth. So, they called this whole pattern the message handler pattern, a very popular pattern. Now with messaging frameworks, you can imagine how that looks like. If you think in terms of potatoes, if the endpoint is a whole potato, then message handlers are fries that are cut from that potato. So, that's what they draw in their architecture diagram. The shipping part now consists of four bunches of potato chips. That means that they can have longer business processes there without having more deployment. That was very important back then. Maybe not as important.
- 20:17 Szymon Pobiega
- Now, when people have 1500 microservices in their environments, I would not recommend that, but I heard these stories, but it allowed to scale their complexity, logical complexity without scaling the deployment one. But it had also one negative or slightly negative consequence, a very important one for the rest of the story. Introducing this message handler pattern meant that no longer the routing is defined by the types of messages. So, previously when you wanted to send message A, as I mentioned, you send it to QA and that was processed by endpoint A. Nothing here, no logic here, no conditional logic.
- 21:01 Szymon Pobiega
- Now, when endpoint A has multiple message handlers, endpoint B that sends messages to it needs to figure out, well, when endpoint B sends a message, it needs to figure out to which endpoint a given message should be sent. So, the change that was required here to build that type of architecture was to teach each endpoint how to route messages based on the type. So, here I'm, let's say I'm endpoint C with that table, I want to send a create shipment message that goes to A. Add parcel message goes to A. Remove parcel message goes to A, but confirm delivery goes to B. A very simple logic that can be created well in a two-column table. And that logic became part of each endpoint.
- 21:53 Szymon Pobiega
- Each endpoint had to know where to send things, and that change went to production and everybody was happy and the development team was given some time to experiment with technology. Management frequently thinks that engineers just want to do engineering things from time to time at least. So, go forth, do some cool stuff. And everybody thought that cloud is the best thing currently available, and everybody wanted to have cloud in their resumes just in case the pierogi company didn't go so well and they went bankrupt and they had to look for another job.
- 22:28 Szymon Pobiega
- So, the development team decided that the new features that they were supposed to implement, they will implement in Azure using Azure Service Bus as a messaging infrastructure just because they wanted to play with it. The initial idea was fairly simple. We know the messaging bridge pattern, so we can integrate any queue with any queue. Super simple. We have that already. So, if we build that new set of four potatoes in the Azure, we can integrate them with messaging bridges with the existing on-premises site and will be done in one sprint. Unfortunately, the reality was slightly different because the communication paths between the new part and the old part were quite complex.
- 23:14 Szymon Pobiega
- While the shipment system was connected by just one communication path with the core, the new set of four potatoes, four features was very tightly connected, tightly coupled to the existing thing. I won't get into details if it's good or bad. That was the case. So, the number of bridges, messaging bridges they would have to build was quite big. And I want to spend two slides here to discuss what's going on. Why did this had to happen? So, let me explain how that bridge works in detail. The messaging bridge pattern, when A wants to send a message to B that runs on different technology.
- 23:56 Szymon Pobiega
- Let's say, A runs on SQS here, the A endpoint, A sends a message to BQ that runs in the same technology as A, so on SQS, and then bridge takes that message and sends it to the real BQ on the RabbitMQ. And similarly, when B wants to send a message, it sends to the AQ in RabbitMQ where the bridge takes it and sends to the real AQ. We sometimes call these queues that are processed by the bridge, a shadow queue because they are like cast shadows on the other side of the river. But when you want to add a new set of endpoints that need to talk to each other, you need another bridge and another set of two shadow queues.
- 24:38 Szymon Pobiega
- We can see that if they had already four communication paths in their system, they would need four bridges. And that was just for starter. So, they said, "No, that's not the way to go, let's try to be smarter again." And they started by asking a question, well, we have this bridge thing, but do these queues really need to be named like the endpoints queues? Can we give the bridge its own identity and have some other name for these queues? So, they call them R. So, there is RQ in RabbitMQ and RQ in SQS here in this example, and it looks otherwise exactly like the previous one.
- 25:20 Szymon Pobiega
- The difference though is when you scale the complexity here and you add another pair of interacting endpoints, now you can reuse this queues to send messages from C to D. But you're probably wondering now, well where did that complexity go, right? The deployment complexity didn't just go away and disappeared. That's generally the fact of life. Our job as software engineers is not make the complexity go away, but to shift it from one place where it's hard to manage it to a different place where it's easier to manage it. In their example, the hard place to manage complexity was the deployment aspect. The easier one was the logical aspect, but that's not always the case.
- 26:04 Szymon Pobiega
- So, where the complexity did go where we can see that where it goes, when we imagine what endpoint C has to do, when C wants to send a message to A, it sends it directly. Both of these endpoints use SQS in Amazon to interact with one another. But when C wants to send a message to let's say B, it needs to go through that RQ, right? So, the routing table in the C endpoint needs to have another column via column that says either nothing or it says the name were to send the message to reach the destination like intermediary. But not that, not all. If it was all, then the thing that processes the R queue would receive a message and would be wondering, oh, I have a message. Where do I send it now?
- 26:56 Szymon Pobiega
- So, what C also needs to do is it needs to include the metadata in that message stating the ultimate destination. So, when it sends a message to R, it needs to set a header saying A, when you receive that message, send it back to B. And I think in the middle they called router. Unfortunately, that pattern is not documented anywhere, at least to my knowledge in that shape. What the router pattern allowed them to do is to decouple the logical message exchange patterns, the arrows that doesn't have those are just the outline with the couple of these logical message patterns with the physical message delivery patterns through the router.
- 27:43 Szymon Pobiega
- So, the potatoes talk logically to each other, but physically what happens is the messages go through that router thing in the Centre. And that was a very successful design pattern again. So, it went to production fairly quickly and the management was happy. And in the meantime, what the management team figured out is that the business is growing so fast that in certain locations they need to have a dedicated production and storage facilities because the headquarters could not just cope with the load. What that meant to a development team is that they need to integrate somehow these remote locations into their system.
- 28:32 Szymon Pobiega
- They called these sites, production sites, warehouse sites, and actually the name site quickly became something that was part of their common language of the software. So, they started to refer as a site to a group of potatoes that run in the same location, be it cloud or on premises. So, they rewrote the architecture diagram. They shifted everything to the site to make place for the new sites. These were potato shaped things containing potatoes in them running RabbitMQ. Why? Because these warehouse and production sites were expected to have very limited internet connection. Remember that we are back into the past, not present. So, the connection was kind of sketchy.
- 29:21 Szymon Pobiega
- So, they were running RabbitMQ in order for them to be able to run even when the connection is down. So, now the problem is how to connect them. And there is this saying that I was trying really hard to find who to attribute to, but found at least three different versions. So, there is a generally saying that generals learn to fight their last war when they're learning. So, our fight, the last war, I think it also applies to software engineers and software architects, slightly modified that we software people, we tend to solve our last problem, which means that we try to apply patterns and practices. We used recently to whatever challenge the world has thrown at us right now.
- 30:10 Szymon Pobiega
- So, at first, the pierogi delivery team, when faced with another situation that look like message bridge can solve, they immediately jump into, "Hey, let's use message bridge to connect the new Azure part with the on-premises part," right? And it turned out to be bad idea because the bridge would not scale. So, here they immediately jumped to conclusion, "Well, we have a queue on one side, we have a queue on another side, let's use message router." And that would look like more or less like this, put router in the middle connected and works. Unfortunately, some early tests showed that it doesn't. Not all message middlewares are created equal.
- 30:56 Szymon Pobiega
- Some of them are better in one thing, some of them are better in another thing. I know that Azure Service Bus has evolved in the last 15 years a lot, but back then when the action was happening, it was very early days for that service and it wasn't able to cope with occasionally connected clients. It required a very high quality internet connection to operate. So, it was a no-no. They could not connect to Azure Service Bus from these storage facilities. They needed something better and more reliable. And at that point, Azure storage queues, another Azure queuing technology was chosen. It's based on HTTP, not connections, not long-running connections. So, it was much better suited.
- 31:49 Szymon Pobiega
- But now how do we use it? Well, we know we are solving our last problem, so we are just going to throw more routers into the picture and we'll feel everything with these yellow, sorry, green things. That diagram started to be a bit clumsy to use, right? It's all over arrows, routers, whatever. So, the team discovered that they actually can redraw it without losing any information. Without these arrows, they don't add any value. They figure out that if a site is a place logical or physical where these potatoes are hosted, that runs a single messaging infrastructure, then a router. The definition of the router is a thing that connects multiple sites.
- 32:40 Szymon Pobiega
- So, those of you who had networking classes in the university for example, could probably recognize pictures like that. There's nothing new in software. It looks more or less like IP routing diagrams. We also have routers that connect to networks. So, in fact, what the pierogi delivery team is kind of reinvented the IP routing protocols over MSHQs, but whatever, it works for them. If it's stupid and it works, it's not stupid. Deployed to production works super great. Another task, the management team realized that they need to embrace the cloud more. The cloud worked for them fairly well, and after a few years of using it, it proved to be much cheaper to operate than the on-premises data center.
- 33:34 Szymon Pobiega
- They encouraged the development team to try moving some more functionality from the on-premises to the cloud. The team started with this potato here outlined in red trying to figure out, okay, what happens if we move that to Azure? What will change? So, first of all, they started to identify the communication paths. That's the first thing you want to do when migrating stuff to the cloud is, okay, what things does it talk to so that these things will be affected when I move it? So, they identified that A talks to B, C, D and E in some way, and I showed here on the left-hand side the routing tables involved in these endpoints. So, what happens if you move it? Well, all of these routing tables needs to be changed.
- 34:24 Szymon Pobiega
- These are part of the logic of each endpoint. So, that means that when you are moving one endpoint from one side to another, you need to change a whole bunch of other endpoints that communicate with it. So, that's not really great idea. It basically means that when moving a single piece of functionality from one place to another, you need to change everything or at least redeploy everything. And that has a huge potential of failures. So, what if we tried differently? What if we did not put our message routing logic inside each endpoint, but try to make the global thing? Try to describe our whole system globally as this message goes to that place, and maybe that would help.
- 35:13 Szymon Pobiega
- So, the first attempt is this table on the left, but it has this downside that it doesn't actually capture all the information. It just says that when A wants to send a command to E, that command goes to Azure. It doesn't help because it doesn't have information where A is when it sends the command. So, let's try A to do better. If we split that table, we can actually capture all the necessary information in two simple tables. Table one shows what is the logical destination of each message. So, logical routing. The A's command goes to E, the D's event goes to A and so forth.
- 36:00 Szymon Pobiega
- You map the message type to the destination where it should be sent, and then the separate table called deployment table says endpoint A is in Azure, endpoint B is on premises and so forth. So, when I move my endpoint A from on premises to Azure, I just update the second table, wait until everybody gets the updates and propagates the caches and whatnot, and then everything should work fine, right? Well, it should. What we actually traded here is knowledge that was distributed among all endpoints, the routing knowledge distributed among all endpoints. We traded it for a description of the system, a global description that is made manually.
- 36:51 Szymon Pobiega
- It's highly redundant because it has a lot of information that is already present somewhere else with a huge potential of failure in production. That deployment table that was there previously that mapped endpoints to sites, that can't be treated as something that applies to every environment in their system. So, you have to manually change that in production. And manual changes in production are not really things that work really well. So, the team again, decided to go for some piece of advice. And what it learned in a cryptic way is that basically they did not think hard enough about the problem. So, what are the concerns in message routing, in message-based distributed systems?
- 37:40 Szymon Pobiega
- First, logical, I mentioned that one. Meaning, where do I send a message? Who should process a given message? Then there is a deployment concern, which is about, okay, I know who should receive that message, but where that thing is, where is that endpoint? How do I reach it? And then there is temporal aspect that is very important here because we have multiple environments in every non-trivial system. We have our development environments on our laptops, we have staging, testing production. Each of these environments evolved in a different pace.
- 38:17 Szymon Pobiega
- So, we need to take that into account and we cannot say, "Okay, the routing is this because it's different on staging, different on your local dev because you're developing a new feature and it's different in production." So, that needs to be taken into account, and that's knowledge. The routing knowledge should be treated as a first-class citizen here, and that knowledge has to be properly generated. What do I mean properly? First it needs to be dry, D-R-Y. So, we don't want to repeat ourselves when defining that routing knowledge. If a piece of information is already present somewhere in the environment, we should be reusing it, not duplicating.
- 39:03 Szymon Pobiega
- As an example, when you have a automated deployment, you deploy piece of software to server X. Let's say let's use virtual machine X. Then the deployment script is the source of true where that piece of software is because it does the job of deploying it there. If you then create an XML document that states, oh, by the way, this piece of software is on that machine, that's duplication because you are duplicating the knowledge that is in the deployment script. And from the moment you write it, there is a potential of these things diverging. And of course the deployment script is right because it does the job. The XML document that describes the routing is going to be wrong. So, we don't want to do that.
- 39:49 Szymon Pobiega
- So, instead of duplicating that, we want to derive the information. We want to identify where in our system we have sources of knowledge about route message routing, and if these sources of knowledge contain that knowledge in a form that is not really useful for us in current present form, we want to derive the form that would be useful in some way. And then we want to take discrete pieces of knowledge and we want to aggregate to create the global view of the system. And the delivery team invented a very clever way of dealing with that problem of generating that knowledge. So, suppose these folks here on the left are not only people, but generally people or automated processes.
- 40:39 Szymon Pobiega
- Some agents that have some knowledge, each of these folks have a partial knowledge of the system knows something about something. So, let these folks create documents somewhere where they describe their knowledge of the thing, their most current point of view of a given thing. Then let's aggregate these documents based on identity of these agents. So, if they all represent the same logical endpoint of our system, let's put all these documents in the folder and call that folder the name of the logical endpoint, for example.
- 41:17 Szymon Pobiega
- And then let's have an automated process that grabs all the documents from a single folder, potentially merging from documents from another folder from top and generates a new document and puts it in a folder. A very simple transformation. We take many documents and we produce one document in a folder and there might be multiple of these transformations, placing documents in different folders. Sounds simple, right? So, what would be the pieces of information that would be in these documents from the perspective of me as a messaging endpoint, someone who receives and sends messages. What I know and can share in that folder is who am I? What's my endpoint name? What queue am I connected to?
- 42:03 Szymon Pobiega
- And if I'm scaled out, if there is multiple copies of me there, then each of copies of me has some sort of identity attached to it by the deployment process. So, I want to also share that information, everything about me. Then what can I do? I'm running .NET process. I can inspect myself and I can figure out what classes that implement. I handle messages, I have loaded so I know what messages I'm capable of handling. And finally, if I'm connecting to a messaging infrastructure like RabbitMQ or Azure NServiceBus or SQS, I'm using some sort of connection string that contains identity of the broker, so I know what I'm connected to.
- 42:50 Szymon Pobiega
- And these pieces of information are enough to build the whole routing mechanism with that process that I described. A document aggregation process. Here's how it could look like in the pierogi example. So, we have three potatoes representing endpoints. One of them is scaled out, has two instances, processing messages in parallel. Each of these creates an instance document saying, "Hey, I'm endpoint A, instance one, I have these free message handlers and I'm handling happily messages." Then there is a process, a knowledge aggregation process that aggregates these two instance documents into one endpoint document.
- 43:32 Szymon Pobiega
- And that document describes, Hey, there is endpoint A, It can process this message type an example. Example where that aggregation becomes a bit tricky. Might be, "Oh, one of the instances have three handlers and the other has two handlers." That can happen when we have a rolling deployment where we are not deploying everything at once, but we're deploying one instance, checking if it works, the new version works. Then deploying another one, another one so that our users always have something that is working in our environment. So, generally when we aggregate that, we take the least common denominator and aggregate in the endpoint document only all these handlers that all the instances can work with.
- 44:21 Szymon Pobiega
- And then on the other side, we have routers. Each router is connected to multiple messaging infrastructures. So, the router instance writes in that document, we aggregate that to the router document, and now having all the routers, knowing where they're connected and having all the endpoints, knowing where they're connected, we can create a site map that says, oh, from endpoint A to endpoint C, I need to send a message through router R one, R two, R three, and whatnot. We have all that information here in the sites document and also aggregating all the information about who can handle what messages. We can create a long list of message handlers that contains information about who can process what.
- 45:10 Szymon Pobiega
- That information can be read by a human being, intelligent human being, and that human being is going to make a decision, "Oh, I need to send a command to a purchasing system I have in my system, two endpoints that can handle poor, just command, one old and one knew. Where should the command go?" These kind of decision are made by that human being over there. And the result of this is the logical routing document that describes what should happen in a system. It can be validated against the handler's document stating, oh, the system can raise problems like, "Hey, you human being, we want to handle this purchase command in the shipping system that doesn't compute, that can't handle that message."
- 46:00 Szymon Pobiega
- And then finally, we can combine all this document to form a routing table for the entire system. So, that's how knowledge aggregation derivation works in practice in the pierogi company. It's a very lightweight process that think that they design that pattern that has no name because it doesn't have any high requirements on their storage capability, and it's easy to integrate. You can imagine that in these warehouse sites, they're using PostgreSQL to store the documents and where the routing logic, the derivation logic works as deployed in Azure, they're using blob storage. There is nothing easier than just mapping a bunch of rows in PostgreSQL to a bunch of documents in the blob storage, right?
- 46:49 Szymon Pobiega
- So, that documents to document mapping. Super simple. The temporal aspect of routing can be handled by introducing some source control in the middle. So, here's an example where some operator of the system is designing the logical routing document in the staging environment. In the staging, it has data about all the endpoints, logical endpoints, what they can handle. He or she can click wherever in the UI, define, okay, that commands go to this endpoint and that's stored in a blob in that environment. But we can have a process that takes that from that blobs into the environment and puts it into our Git repo so that it's properly versioned there.
- 47:35 Szymon Pobiega
- And when there is time to actually deploy to production, we are not copying environments, we are not making any manual changes. We are taking that information that was generated in the staging environment and persisted in our Git repo to production safely and securely. So, to summarize, because I have two minutes time left, we learned today about some new high level, well new and all high level patterns for building distributed systems. So, first one was the endpoint, the potato pattern and the handler, the french fries pattern. Then we look at the bridge pattern, the router pattern, and the concept of sites and how routers and bridges connect sites.
- 48:19 Szymon Pobiega
- And finally, we look at patterns for knowledge generation, the aggregation, the derivation, and the mapping of documents that allows us to generate a global view of the system from discrete pieces of knowledge that each agent in the system contains. With that, I would like to wrap up. I would like to invite you to look at the blog that I run with my friend Tomek, and thank you for being here. I'll hang out here after the session for some questions and if you have any questions now, that's QA time.