All About Transports
About this video
Knowing which NServiceBus transport is best for your application is not easy. There are many factors involved in selecting a message transport; distributed transactions, legacy integration, cross-platform capabilities, and cloud deployments are a few that might be considered.
At NSBCon 2015 Andreas Ă–hlund outlines the different transports that are available for NServiceBus. He covers the highlights and lowlights of each. Rather than telling you which transport is the right one, Andreas provides you with the tools to make that decision yourself, within the context of your project.
If you’re considering which transport to use on a new project or wondering if the transport you have chosen will continue to work for you, you’ll get a lot out of the presentation.
đź”—Transcript
- 00:01 Andreas Ohlund
- Anyway, welcome. My name is Andreas Ohlund. I am a Swede, not a Swiss, so I'm not going to throw... I did bring some meatballs with me, but the US customs kept them. So they're having a meatball party now at Chicago Airport. Just kidding.
- 00:18 Andreas Ohlund
- So I'm going to talk about the transport, i.e. The queuing systems that we run on top of. This could probably be the most boring talk in the world, with a lot of tables, message size limits, and yada-yada. So I tried to do it a little bit differently. I'm going to tell you a little bit of a story about a Swedish company. I am going to try to highlight some of the highlights of each transport and some of the low-lights. So that at least, when you walk out of this talk, you will have a general picture of, "Where I would use each given transport?"
- 00:53 Andreas Ohlund
- Anyway, let's talk about the Swedish company that I've been interviewing. It's called, Swedish Meatballs, Inc. Essentially they have the license to produce the trademark Swedish meatballs, and they have kind of an interesting background. It's all started back in 1926. A furniture store was founded in Sweden, and it got bigger and bigger, but they had some issues because a lot of the people attending the stores had some energy problems, and they were feeling depressed, especially the males. So after halfway through the furniture store, they kind of lost their energy. So they kind of thought that, "We need something to make these people endure a trip to the store, so they can actually pay at the end." So they figured, "Let's create the dish that'll energize them enough to enjoy the ride." I know probably, many of you have been to those stores. I can see there is one across the road, here even. Anyway, so the tagline for this company is, "Helping people survive at furniture stores since 1926."
- 02:04 Andreas Ohlund
- All was good of course, but as we all know, those stores were getting much more successful. And as everyone else, they were sort of this meatball firm, were just sort of becoming the victim of their own success. They were starting having issues with the system, availability issues, scalability issues, and they kind of realized that they needed to do something with their architecture, and they didn't really know what. So, since hygiene is very important in this business, they kind of thought that a lot of soap would work. So they put soap web services everywhere. But that soap didn't really lead to clean code or clean design, or clean architecture. It sort of led them to some kind of web services hell. So back to the drawing board.
- 02:50 Andreas Ohlund
- But one day the chief architect was in one of those stores and there was an outage, and he can see the queue was just increasing. And then it dawned on him, "Queuing. Could that be it?" It's a little bit ironical, because Swedes are probably the best people in the world to queue. We know how to. You stand last, you wait for your turn. And if someone cuts the line, you get really pissed off, but you don't say anything. You go home and you're pissed off. Anyway, they thought, "Let's try queuing." And after debating what queuing system to use, they kind of figure it out, "Since we're a dot net shop let's... MSMQ, it's there, it's out of the box. Just enable it."
- 03:32 Andreas Ohlund
- So as I've been interviewing them, I asked them to tell me, "What's the biggest benefit that you saw with Amazon Queue?" And the first thing that they talked about was the store and forward. You can achieve this on pretty much all the queuing systems, but MSMQ's different in a sense, that store and forward is really the default mode of operation. It runs on all your machines.
- 03:54 Andreas Ohlund
- So a quick primer on what store and forward is. It's essentially, when you want to send out a message from your process, it gets stored locally. That's why it's called, store and forward. So if you want to talk to another node, if that node is up, the queuing system will, sometime later, move that message over to the store on the other side, that's the forwarding part. And if your process happened to be alive at that stage, the queuing system is going to deliver it for you. So it's very simple, that's essentially, store and forward.
- 04:29 Andreas Ohlund
- And what's the benefit there? Well, what it gives you, is that you don't have any single point of failures. You don't rely on a process, on some other machine to be up, for you to be able to get your messages out. And for them, that was really important, because they had a lot of shaky old systems that were down all the time. But introducing queuing like this, they could keep on accepting new orders, even though their flaky back ends were being rebooted, or even redeployed.
- 04:59 Andreas Ohlund
- The next thing they brought up, it's surprising, they told me that distribution sections were really good, which is weird because we all know they're super bad, right? If you Google it, the first six hits are going to be some horror story about how bad they are. So I guess lesson learned there, don't Google anything. If you have a rash down your hand, if you Google it, you're gong to read all the horror stories about it. So, I sort of dig deeper, then ask them, "Okay. So even though everyone thinks they're so bad, how come you kind of think it's good?"
- 05:32 Andreas Ohlund
- So let's dig in to what a distribution section really is. This is from Wikipedia. It's a wall of text, I'll make it easier for you. So essentially, DTC transaction is about two or more network hosts being involved. They're doing some kind of local transaction with acid semantics, essentially meaning, locking stuff in your database. And at the end, they have to agree on an all or nothing outcome. So all those hosts have to say, "I'm good to go, are you good? Okay, let's go. So let's commit."
- 06:05 Andreas Ohlund
- You can kind of sense that this is bad now, right? So what is the bad here? The bad thing is that, as you are enlisting more and more resource managers, it's going to get slower, it's going to get slower, or it's going to get slower. Because, remember the all or nothing outcomes, right? That means that all those guys have to agree on something, so they have to talk a lot. So there's a lot of overhead. The more resource managers you add, the slower it's going to get. And also, if one of them has some latency issue, remember all or nothing, they have to wait for this one to decide if they're going to commit or not.
- 06:39 Andreas Ohlund
- Adding acid to all this means that you're going to keep logs on your database for long periods of time. And the duration of the entire transaction is the maximum each individual one, plus whatever overhead. This creates a vicious circle. Keeping logs for a long time in a database means that you're more open for deadlocks. If you get a deadlock you'll roll back, and all the other ones have to roll back. So keeping logs for a long time, a lot of rollbacks, you can clearly say that this isn't good, right?
- 07:07 Andreas Ohlund
- So, I kind of dig deeper, said, "Okay, but why is this... Okay, this seems wrong. You say, it's good, but it's bad." Well, it turns out that the reality when you're using a service bus, it's still true, everything I've said so far, but it's a little bit different. So first of all, what nodes do you have around? Remember, two or more network hosts. Well, we have our app node running DTC, MSMQ is a local queuing system for members to run forward, and we have our client here, or your message channel. And the other side, that we're going to sync up with, is not some random dude's RaspberryPi over the internet, it's actually your SQL Server. So in terms of latency, pretty much guarantee that, that's going to be kind of low latency operations.
- 07:53 Andreas Ohlund
- So first of all, we only have to involved resource managers. We don't have many. We have two that need to read, so there's much less overhead. And what about the duration of the transaction here? It's not really random. It's essentially, you are executing your SQL, that you can craft... Yes, if you were say, into EF Framework or NHibernates, it could be argued how much you are in control, but it's SQL statements that you execute against your SQL Server, against tables that you optimize. If that takes a long time you should be able to handle that. So the duration is pretty short here.
- 08:35 Andreas Ohlund
- So you have two nodes that have to agree. So in this setting, yes, there is some overhead, you will lose some performance, but you'll be surprised on how little performance you will actually lose. This is what they meant when they said that, for them, in this setting, this here was pretty acceptable for them, because it's really helped them write very simple code focused on the business problem. So essentially, in this case, when they were creating the meatballs they called them, DB store meatball database. Some publishing event telling the world that the meatball is ready. This is really it. They don't have to worry about weird things like item potency, I can't even spell that, and ghost messages. We'll talk later about what you have to think about when you're giving up on DTC. Yep, that's it.
- 09:25 Andreas Ohlund
- "Well," you might ask, "What about non-transactional things?" We saw this on a saga masterclass a few days ago. The answer to everything in life is 42, but NServiceBus, the answer to everything is, "Send the message." So if you want to do something that count and list, or if you want to have two resource managing the mold, instead of doing it on the same pipeline, you send a message. Instead of making a call to web services to charge a credit card, instead send the message. Because now we're down to two resource managers again, the database and the queuing system. And I'm going to create another end point that'll handle the credit card, charge credit card command, so that will be the queuing system and a non-transactional resource. That's the way to solve things with NServiceBus, send a message.
- 10:16 Andreas Ohlund
- Everyone that's set up the DTC knows that it's kind of finicky. You have to muck around with a lot of stuff. You have to talk a lot video or ops people. But to be honest, once you get it working, it works. It's rock solid. Most of our clients are on MSMQ and SQL Server combo, and we'd rarely hear complaints in this area. So we just learn to use the DTCPing.exe, and you'll be good. Of course, there is something that you need to think about, and it's all about trade-offs. Because if you're leaning on the DTC to write less code, you're essentially making a trade-off of shorter time to market, quicker getting your code into production with less bugs, versus locking yourself into a specific setup.
- 11:08 Andreas Ohlund
- If you're fine with running on Windows using MSMQ, talking to SQL Server, or Oracle, that might be perfectly fine, but you have to be aware that if you want to switch to other queuing systems, or other storages, you can't do that without recoding. We'll see later when I talk about RabbitMQ, what kind of code change you will have to do if you want to stop using distribute transactions.
- 11:31 Andreas Ohlund
- And of course, it comes up quite often, "That whole MSMQ's dead." Well, Mark Twain even quoted MSMQ. So it's still there, I think it's going to be around for quite some time more. So you still have plenty of time to migrate if you want. Or was it the other way around with Mark Twain? I don't know.
- 11:54 Andreas Ohlund
- Anyway, continued talking to them, turns out they have the same problem as everyone else, the big ball of mud, they call it the big ball of meat. I guess, that's the nature of their business. They wanted to essentially break up their monoliths, applying the Udi Dahan principals, components, highs, loose coupling, and all that. And it was fairly hard. They had this huge SQL Server app, store procedures left right center, the consultant role, all that, they actually lost it. They almost didn't have the code.
- 12:31 Andreas Ohlund
- Of course, they've been kind of burned on that soap web services architecture, and they were kind of looking at a meatball oriented architecture, but they were not sure. But now the latest hype, the micro meatball architectures, they were hoping that this is it. How could they start to break up this SQL Server monolith? They kind of figured out, "If we can start getting events out of our database, that's a nice first step to start publishing events out. Then we can cut up this big ball of meat and replace it one part at a time." So they were looking at different ways of getting those events from the database out into MSMQ, and turns out it wasn't that, really easy.
- 13:15 Andreas Ohlund
- So, they kind of thought that, "Well, there is this SQL transport and well, no one has been fired for choosing SQL, we all know that." Kind of figured, "If we use the SQL transport, we could actually write logic in our sort procedures and God forbid, the triggers. So achievement unlocked, having trigger code in the slide, this is a first for me.
- 13:40 Andreas Ohlund
- There's a place for everything, and I think trigger is pretty nice here. You don't have to wade through all that store procedure logic, or whatever code you have. In this case, if there is an order inserted in their order legacy table, we will publish an event called, order accepted, and the SQL transport will pick it up. So that's one of the key benefits of a SQL transport, is that it's very easy to integrate and get events, get messages out of your old SQL systems.
- 14:13 Andreas Ohlund
- Of course, now they had two transports, they had MSMQ and they had SQL. So of course, after a while I realized, we need to get those ones to talk to each other. And we don't have a officially supported bridge yet, but we are slowly sort of building up the building blocks to provide that for you. It's a really hard problem to have a generic bridge that just works. But in v5 we added the multi hosting feature, so you can have two end points in the same process. As Udi mentioned in his keynote, we're adding multi serialization in V6. So, taking small steps towards a bridge solution. Now that means that you can use different sterilization formats in different parts.
- 14:54 Andreas Ohlund
- We have a sample up on our docs site that shows you half the bridge SQL and MSMQ. So the point communication is fairly easy. It's the pub/sub that's the tricky part to get right, and the consistency. Reaching a non-transactional queuing system with the transactional ones, there is some challenges with applicants.
- 15:13 Andreas Ohlund
- I kind of asked them, "Okay, what's your experience with using SQL as a queuing system?" The word they were using is that it's really comfortable, like the sofas at the store. Well, some of them, I guess. But the key thing they brought up is that, they can now start using queuing without introducing new infrastructure. They were looking back to the time where they went to using MSMQ, and they said that, "If we would've known this, we probably would have went for the SQL transport right away."
- 15:50 Andreas Ohlund
- And I've seen that myself, that the bike shutting will happen when a company is going to select a queuing system, it's kind of, everyone and their dog has an opinion, "My grandma used this queuing system. It's awesome." And then, "No, no, that's no good. We have to use X." So using SQL transport could really avoid that type of derailing your project. You can get queuing in without having the one new piece of infrastructure discussion.
- 16:16 Andreas Ohlund
- Of course, high availability already in place. If you're depending on SQL Server, you probably already have active passive clusters and whatnot set up. And of course, backups, that probably is already in place. And it's a fairly unique scenario, because now we can get a consistent snapshot backup that includes your queues and your business data. So this is probably the only transport that can give you that point in time. We take a backup, then we have executor the messages, and the business data, in a consistent state.
- 16:49 Andreas Ohlund
- Of course, there is downsides as well, and the one they brought up is, it's going to cost you. How many is using SQL? Is it expensive? It can be expensive because when you scale a SQL Server, you're most likely going to scale up and that's what Microsoft wants. So, you need the enterprise additions and data center, whatever they call it, and prices just keep on going up. And if you want to be super highly available, you turn on SQL Server Always On, that's probably the most expensive checkbox in the world. Well, selecting SAP as a business system might be more expensive, but this one is up there, at least. One thing to know, when you turn on SQL Server Always On, you will lose the DTC, but Microsoft ISCA is reintroducing it in 2016. So the next version of SQL lectures supported DTC with SQL Server Always On.
- 17:50 Andreas Ohlund
- So they were fairly happy, as I said. So they would like to give the recommendation that if you're a .net shop using SQL and you don't really know what tuning system you want to go with, probably should start off with SQL transport, to test out how messaging work for you and go from there. Right at this time, there was a sort of defining moment for the meatball company. They got a new CTO in, he was really power crazy. His motto was, "Conquering the world one meatball at a time."
- 18:20 Andreas Ohlund
- And now things started to happen. They started to buy companies left, right, center. They bought the Swiss chocolate factory because they wanted to have the chocolate meatballs. They bought a lot of businesses out of Java business systems, the bean provider factory systems. And they had to integrate all that. And given that the new CTO also was a Swede, why not do a securing system built on Erlang, because Erlang, built by Ericsson back in the... Was it '80s? Can't remember. Erlang is an awesome program in language and Rabbit is built on it.
- 19:06 Andreas Ohlund
- So they decided, "You know what? We're going to use RabbitMQ, because it's really awesome to integrate things with." That's really where it shines. It has client libraries for every programming language known to man. It runs on all the platforms, it's AMQP compliant, if you want to connect a lot of different platforms, Rabbit is really, really interesting. All that was fine. One thing that sort of was a wake up call for the development team is that, now they lost all their transactions. Rabbit doesn't have any kind of transactions at all.
- 19:44 Andreas Ohlund
- So there's a few problems with that, that you have to know about, and know how to mitigate them. The first one is ghost messages and the other one is duplicates. So we're going to dig into what I mean with ghost messages first. So remember that simple code we wrote with MSMQ? That's all fine. The problem when you're losing transactions, now the order of statements matter. If you happen to do the bus.publish first, before your store, then all transactions mean that, that is not going to roll back. So if your database throws a deadlock or something, that message is still going to go out. And this might seem sort of, "Yeah, it's so obvious, right?" Of course, you need to put the publish last, but remember NServiceBus multiple handlers for the same message. So in a real world, it's not that easy to spot those ordering problems. You have to do a very thorough code review to spot those.
- 20:37 Andreas Ohlund
- So in this case, if the DB rolls back we have a ghost message, and they're really, really tricky to deal with. We haven't stored anything in our database, but we tell the world now that there is something. So as we saw in the saga workshop, a lot of downstream business processes will now kick off and do their thing, but we still haven't actually stored anything. That can create some nasty situations where, you're billing credit cards, but you're still not shipping orders, nasty stuff like that.
- 21:07 Andreas Ohlund
- Next one is to deal with those duplicates. Because in this case, we blow up in the store, we send out the message, the next time it actually works. In scenarios like this, you will generate duplicate messages. So your code that you write will have to be prepared to dealing with those duplicates. So if you first look at inserts, essentially what you need to do here is that, first of all, you need to have a client site generated IDs. That's a good thing anyway, so you should probably be doing that anyway. You have to have something in the business data that uniquely identify the thing you want to do. In this case, we have a meatball ID.
- 21:47 Andreas Ohlund
- So this case, we're inserting a database. And if that thing throws a key constraint exception, we used to have to mute that, and then we have to call bus.publish anyway, because we don't know if the last time, when we actually inserted it, if we were able to publish or not. And this will generate further downstream duplicates, that your end points have to deal with. So to deal with inserts, you have to try catch the key constraints.
- 22:14 Andreas Ohlund
- The other scenario is when you're doing updates, and this is the tricky one, so please forgive my pseudo code here. Essentially, in this case, we were adding meatballs to a plate and incrementing the count. In order to figure out if we're dealing with a duplicate or not, we have to remember what message ID hit this entity. So in this case, we're only updating the meatball count if I haven't already processed this message. So however you achieve that, but in this case, we use inserting the message ID in a collection on that entity, as we are processing it. And of course, you have to publish always well. So some non-trivial code that you will have to write, if you want to be 100% consistent. Most people don't do that, they just hope for the best. And sometimes that's okay, but you have to be aware of it.
- 23:10 Andreas Ohlund
- Funnily enough, this is essentially, what we are doing for you. So if you were happy with some infrastructure, lock-in again, you can actually turn on the outbox feature on NServiceBus, because it's sort of doing that infrastructure level message ID application for you. But now we're back in lock-in territory a little bit, because we only support for now, NHibernate, and Raven. So if you're fine with those databases, instead of writing that code, you can turn on the outbox instead. But if you want to use MongoDB or something else, then well, you will have to write the persist to yourself, or ask us to do it. So when you're switching to a non-transactional queuing system, be aware, most likely you have to go through your source code, and make sure you're prepared for the duplicates, and those ghost messages.
- 24:00 Andreas Ohlund
- Anyway, from an operational perspective, I talked to them, and one thing that caught them a little bit off guard was the fact that they were kind of used to MSMQ. They kind of felt it a little bit with SQL, this kind of single point of failure feeling, and well, RabbitMQ is a broker. So you connect to the broker starting somewhere, and then you send and receive messages from it. You can set RabbitMQ up in store-forward mode, essentially install Rabbit on all the boxes. Then you install the Shovel plugin, or the Federation plugin, and then Rabbit behaves like MSMQ. But that's nearly not what most people do. Most people go with clustering instead, and quoting the RabbitMQ doco... I'll have to give them bonus points for honesty because they don't really handle partitions well. So this is quoted from their doco, "If you have issues with a network, with a cluster of partitions, some really bad things are going to happen, and you need to be prepared for that."
- 24:56 Andreas Ohlund
- So to understand where the problem lies, we have to look a little bit on how HA really works in this node. So if we have a cluster, so we have a load balancer in front of it, we had three nodes, A, B, and C. The way it works is that, each queue you create, one of the nodes becomes the master for that queue. And as you add and remove nodes, the responsibility shifts, but at any given point in time, there's always one master for the queue. By default, it's the node where you created a queue. So essentially, if node C is the master of the queue, all writes and reads is going to happen on that node, and the data and the messages are replicated throughout the node. So if node C goes down, another node will have the data and it will become the master.
- 25:47 Andreas Ohlund
- This creates some funky behavior when you're scaling out. So, if we have a message coming in by the load balancer... So let's assume for now, the queue that we're going to stick this message into is on node C, if we're unlucky and the load balancer sends us over to node A, Rabbit is then going to detect that, and you have to go to C instead. Sp you have to pay an extra network hop. And how many have gotten those black books that David created, about a fallacy? We all know that the network bandwidth is not infinite, so you're going to pay an extra network hop. And if you have large messages, this is going to start to slow you down. In this case, there's a 66% chance that we're going to have an extra network hop. And the more nodes you add, the more you increase the likelihood of you having to pay in an extra network hop.
- 26:37 Andreas Ohlund
- It's just HA Hertz scalability. It's not a huge deal, but you have to be aware. Where the problem comes, is a Rabbit can suffer something called a split brain. So if we have the same set up here, again, assuming that the queue we're going to target lives in node C, if for some reason there is a partition in the cluster, so node A, from a network perspective, is separated from the other ones, then the minority, the node A site will cluster, it's going to elect itself as the master, because he doesn't know about the others. So in this node, now they thinks it's a master of our queue, and node C will still believe it's the master. So as we new load comes in, we'll be writing on both sides of the cluster, which is fine.
- 27:30 Andreas Ohlund
- But the problem is, when the cluster heals again, RabbitMQ has an auto heal mode, which is thankfully, not on by default. But if you have the auto heal mode on, it's going to pick the minority and it's going to remove all the messages, and reapply the one from the other side. In that case, all the writes on the Node A site is going to be gone. So in this case, you would have lost 33% of the messages coming in. The only way to deal with this... There's a lot of famous blog posts in the net, if you just Google for maybe, Rabbit MQ, you can read all about this. The key thing here is that, do not use auto heal because otherwise you're going to throw away the messages. So when this happens, you need to detect this, you need to pull all the messages out of node A, and re-insert them into one of the other nodes manually.
- 28:20 Andreas Ohlund
- I've seen some blog post claiming that they have sort of the step-by-step guide, but I never really seen anyone pull it off. So just be aware that if you don't want to lose messages, you have to be really, really careful to set this up correctly. So essentially, if you're going for a highly available Rabbit cluster, really make sure you involve the professionals, Pivotal, the company behind Rabbit, and their partners, to really make sure they set it up properly.
- 28:47 Andreas Ohlund
- Anyway, all was good. They lost some data, but that was fine. But the CTO wasn't done, "To conquer the world," was his motto. They want to get in the fast food business, so they decided to open a chain of restaurants across the world, serving meatballs. "How can we build this infrastructure?" And of course, being a .net company, they're started looking at the cloud platforms, and being a Microsoft shop, looking at the Azure Cloud was the reasonable move for them.
- 29:27 Andreas Ohlund
- We support two queuing systems there, it's Azure Service Bus, which is the advanced one, but there's also Azure Storage Queue. So the team over at the meatball company was sort of debating which one should be used, "Storage Queues, Service Bus, Storage Queues?" Well, Storage Queues, that's the dirt simpler one. It's just a queue. You push messages into the queue, you receive messages off the queue. It's very stable from an API's perspective, it's quite performant. Essentially, that's actually a queuing system that is the base service for a lot of the other Azure services. So when other Azure services need to shuttle data around, they actually run that on top of the Storage Queues. But it doesn't really have any bells and whistles.
- 30:11 Andreas Ohlund
- In this case, they selected Azure Service Bus, because it has a lot of advanced features for integration. It supports relaying, it's AMQP 1.0 compliant, it integrates with event hubs, it has a lot of interesting stuff. So the team said, "You know what? We'll use Azure Service Bus." This was a big shift for them. It doesn't really matter what queuing system they would have chosen here, because this is sort of the first move from them running their own infrastructure, over to platform as a service, which essentially means that your infra is run by someone else, which is usually a good thing, because Microsoft, they have a lot of time and resources they can put towards running a queuing system reliably.
- 30:57 Andreas Ohlund
- I guess the timing is fairly nice because, last week we had a major outage in the EU West zone, where a lot of the extra stuff was down. And I think there is a current outage right now, right? Yeah, the AD is off for whole of Europe, or something. That might be fixed now. But of course, when things happens, you are not in control anymore. Most of the time, it's fine, but just be aware that you are not in charge of the infra anymore. They can patch it and they can change it. So you have to be prepared for this.
- 31:28 Andreas Ohlund
- Another thing they brought up as an interesting benefit is... Well, back in the old days, shitty code was just shitty code. But when you're running like this, shitty code becomes expensive code, because you pay by the number of the storage transactions you do. It might sound bad, but essentially that credit card bill really pushed them to clean up their code and really re-evaluate, "Should we really be doing it like this?" If I'm uncorrected, use the term, credit card driven development. Our cost is cost driven architecture, whatever you want to call it. So that's an interesting angle.
- 32:11 Andreas Ohlund
- Well, they selected it because... Well, they're going to have stores all over the world, right? And well, running on the cloud global operations just works, right? Well, as long as you think about a few things, so, "How do I partition stuff? What about legal?" There are different rules about data privacy in different areas of the world. "What about security? And what about latency?" And the last one is the one that kind of caught them off guard a little bit because... Well, they had stores all over the world and by using Azure, they thought that, "Well, we can just treat all parts of the road equally." But that's not really true, right?
- 32:53 Andreas Ohlund
- So we talked about the network is reliable fallacy. We sort of talked about the bandwidth is infinite, and here's another one that's still true. Peter Deutsch was pretty right back then when he coined the fallacies. So this really hit them because... Well, fortunate for me, I'm working a lot in Australia, so I get reminded daily about this. They actually sent over a picture of the Australian internet infrastructure. Microsoft doesn't really have any other network equipment. If you're going to shuttle data between Australia and Europe, or Europe and United States, you will pay the latency costs. It's orders of magnitude higher compared to calling stuff in your area that's close, in the same data center, or in the next availability cell.
- 33:49 Andreas Ohlund
- There's a reason why Microsoft doesn't have really, any repetition services, or anything that works across different continents. The only thing you'll get this replicating within availability zones that are close to each other. So when you go into the cloud, make sure that you really take latency into account. There is no rocket science here. Somebody taking this course, "How do I handle latency?" Make sure that you're not chatty, try to be as coarse grain as possible, bring as much data as possible, craft APIs that doesn't rely on a lot of back and forth, try to batch things if possible.
- 34:33 Andreas Ohlund
- Batching, that's a thing that we've been looking at. And that's something that we have now introduced in V6. We have something now, called Batched Dispatch. We kind of built it mostly for Azure, but it turns out that it's fairly useful for all the transport. It's going to walk you through what it means. So what we do essentially, in V6, is that as your handler executes, and you're doing your bus.sends or bus.publishes, we actually, like how the compiler can rearrange statements when it's compiled, we kind of hold on to those messages. We're actually not going to dispatch them right away to a transport. We're just going to store them for you for a little while, so that after your handler is done, we'll collect all those operations, and we'll hand them off to our transport adapters at the end. So that our transport adapters for extra storage queues, extra service bus, RabbitMQ, can, if possible, do smart things, knowing that, "Here is all the messages that we're going to send."
- 35:32 Andreas Ohlund
- So in the Azure case, we can essentially... Well, first of all, we're using the async KPI, and as Daniel showed you, we can now have all the three operations going on at the same time. But we can do even more here, because Azure Service Bus actually supports you to batch things up. It has a message limit of 256 kilobytes. If all those three messages are less than 256, we can actually pack those three together into a single call. So we can actually reduce the number of calls as well. Early tests shows that we can get latency down from 100, 200 milliseconds, down to 10, 20 milliseconds, for the same type of operations. So if you're sending a lot of small messages, this is really going to make a difference for you.
- 36:21 Andreas Ohlund
- And the interesting part is, this kind of optimization is only available in the async API, in the async SDK for Azure Service Bus. Microsoft is only optimizing the async API. Another sort of way of pushing users over to using async instead. In version fire, where we're still calling the synchronous API, we cannot do any of this. So a little bit sneaky. A nice side effect of this is started, we actually no longer need to worry about the ghost messages. Remember how you have to be careful with the order of statements? But now in V6, since we're holding on to everything, we guarantee you that we're not going to be talking to the queuing system, unless everything in the harness actually succeeded. So you no longer need to worry about the ghost messages.
- 37:11 Andreas Ohlund
- We still have options for you to request an immediate dispatch. For some scenarios, it could be valuable for you to say, "Don't return back to me until you're 100% sure that the queuing system has accepted this." There's a new API for that, you can look that up on our doc site. If you want to say, "Please send service bus, no batching, no nothing, send it straight out and let me know when it's safely stored in a queuing system."
- 37:37 Andreas Ohlund
- But as I said, it's really interesting for the other ones as well. For RabbitMQ, we're using something called publisher confirms, to make sure that the queuing system has act and stored all the messages on discs. Now we can do that in batch. We can say, "Here's three messages. Let me know when all three of them are done." So we can optimize all of the other transport as well. For the secret transport, we can batch statements together in one go when it gets to the database, and so on. So it has some real nice side effects.
- 38:09 Andreas Ohlund
- So essentially, that was the journey they took from introducing MSMQ using SQL to get stuff at a database, bridging it, integration with RabbitMQ, going to the cloud, using the Azure transport. I think that summarizes really, how you need to go about the transport. If you compare performance, they all perform pretty similarly, they all have roughly similar message size limits. That's really not what you need to look for. You need to see, "This transport, does it fit the current situation I'm in, and does it fit where I want to go in the future? And how much environment locking can I take now, and how much work is going to be to transition away?"
- 38:59 Andreas Ohlund
- I used to be a consultant in the past, so this is always what you... If someone asks you, you always say, "It depends." So, it really depends. Do the research. Talk to us, talk to others. Remember, you can use more than one. I don't know if there's a term polyglot transport, but if there is one and it hasn't been claimed, I'm claiming it now. So polyglot persistence is cool, and polyglot transport is even cooler. Thanks.
- 39:38 Andreas Ohlund
- Do you have any questions around transport? Surely going to be here today and tomorrow, just walk up to me and we'll have a discussion about transports, and see what will fit for your situation.